Data steps
Steps are the building blocks of data transformation. They are used in chart queries: each step is a single operation in the query (filtering, aggregating, renaming columns, etc.). Steps run in sequence; the output of one step becomes the input of the next.
The JSON examples in this reference can be copied and adapted for your chart queries. Use the name field to identify the step type and fill in the other properties as described for each step below.
absolutevalue step
absolutevalue stepThis step is meant to compute the absolute value of a given input column.
{
"name": "absolutevalue",
"column": "my-column",
"newColumn": "my-new-column"
}
Example
Input dataset:
Company 1
-33
Company 2
0
Company 3
10
Step configuration:
{
"name": "absolutevalue",
"column": "Value",
"newColumn": "My-absolute-value"
}
Output dataset:
Company 1
-33
33
Company 2
0
0
Company 3
10
10
addmissingdates step
addmissingdates stepAdd missing dates as new rows in a dates column. Exhaustive dates will range between the minimum and maximum date found in the dataset (or in each group if a group by logic is applied - see thereafter).
Added rows will be set to null in columns not referenced in the step configuration.
You should make sure to use a group by logic if you want to add missing dates in independent groups of rows (e.g. you may need to add missing rows for every country found in a "COUNTRY" column). And you should ensure that every date is unique in every group of rows at the specified granularity, else you may get inconsistent results. You can specify "group by" columns in the groups parameter.
An addmissingdates step has the following structure:
Example 1: day granularity without groups
Input dataset:
"2018-01-01T00:00:00.000Z"
75
"2018-01-02T00:00:00.000Z"
80
"2018-01-03T00:00:00.000Z"
82
"2018-01-04T00:00:00.000Z"
83
"2018-01-05T00:00:00.000Z"
80
"2018-01-07T00:00:00.000Z"
86
"2018-01-08T00:00:00.000Z"
79
"2018-01-09T00:00:00.000Z"
76
"2018-01-10T00:00:00.000Z"
79
"2018-01-11T00:00:00.000Z"
75
Here the day "2018-01-06" is missing.
Step configuration:
Output dataset:
"2018-01-01T00:00:00.000Z"
75
"2018-01-02T00:00:00.000Z"
80
"2018-01-03T00:00:00.000Z"
82
"2018-01-04T00:00:00.000Z"
83
"2018-01-05T00:00:00.000Z"
80
"2018-01-06T00:00:00.000Z"
"2018-01-07T00:00:00.000Z"
86
"2018-01-08T00:00:00.000Z"
79
"2018-01-09T00:00:00.000Z"
76
"2018-01-10T00:00:00.000Z"
79
"2018-01-11T00:00:00.000Z"
75
Example 2: day granularity with groups
Input dataset:
France
"2018-01-01T00:00:00.000Z"
75
France
"2018-01-02T00:00:00.000Z"
80
France
"2018-01-03T00:00:00.000Z"
82
France
"2018-01-04T00:00:00.000Z"
83
France
"2018-01-05T00:00:00.000Z"
80
France
"2018-01-07T00:00:00.000Z"
86
France
"2018-01-08T00:00:00.000Z"
79
France
"2018-01-09T00:00:00.000Z"
76
France
"2018-01-10T00:00:00.000Z"
79
France
"2018-01-11T00:00:00.000Z"
85
USA
"2018-01-01T00:00:00.000Z"
69
USA
"2018-01-02T00:00:00.000Z"
73
USA
"2018-01-03T00:00:00.000Z"
73
USA
"2018-01-05T00:00:00.000Z"
75
USA
"2018-01-06T00:00:00.000Z"
70
USA
"2018-01-07T00:00:00.000Z"
76
USA
"2018-01-08T00:00:00.000Z"
73
USA
"2018-01-09T00:00:00.000Z"
70
USA
"2018-01-10T00:00:00.000Z"
72
USA
"2018-01-12T00:00:00.000Z"
78
Here the day "2018-01-06" is missing for "France" rows, and "2018-01-11" and "2018-01-11" are missing for "USA" rows.
Note that "2018-01-12" will not be considered as a missing row for "France" rows, because the latest date found for this group of rows is "2018-01-11" (even though "2018-01-12" is the latest date found for "USA" rows).
Step configuration:
Output dataset:
France
"2018-01-01T00:00:00.000Z"
75
France
"2018-01-02T00:00:00.000Z"
80
France
"2018-01-03T00:00:00.000Z"
82
France
"2018-01-04T00:00:00.000Z"
83
France
"2018-01-05T00:00:00.000Z"
80
France
"2018-01-06T00:00:00.000Z"
France
"2018-01-07T00:00:00.000Z"
86
France
"2018-01-08T00:00:00.000Z"
79
France
"2018-01-09T00:00:00.000Z"
76
France
"2018-01-10T00:00:00.000Z"
79
France
"2018-01-11T00:00:00.000Z"
85
USA
"2018-01-01T00:00:00.000Z"
69
USA
"2018-01-02T00:00:00.000Z"
73
USA
"2018-01-03T00:00:00.000Z"
73
USA
"2018-01-04T00:00:00.000Z"
USA
"2018-01-05T00:00:00.000Z"
75
USA
"2018-01-06T00:00:00.000Z"
70
USA
"2018-01-07T00:00:00.000Z"
76
USA
"2018-01-08T00:00:00.000Z"
73
USA
"2018-01-09T00:00:00.000Z"
70
USA
"2018-01-10T00:00:00.000Z"
72
USA
"2018-01-11T00:00:00.000Z"
USA
"2018-01-12T00:00:00.000Z"
78
Example 3: month granularity
Input dataset:
"2019-01-01T00:00:00.000Z"
74
"2019-02-01T00:00:00.000Z"
73
"2019-03-01T00:00:00.000Z"
68
"2019-04-01T00:00:00.000Z"
71
"2019-06-01T00:00:00.000Z"
74
"2019-07-01T00:00:00.000Z"
74
"2019-08-01T00:00:00.000Z"
73
"2019-09-01T00:00:00.000Z"
72
"2019-10-01T00:00:00.000Z"
75
"2019-12-01T00:00:00.000Z"
76
Here "2019-05" and "2019-11" are missing.
Step configuration:
Output dataset:
"2019-01-01T00:00:00.000Z"
74
"2019-02-01T00:00:00.000Z"
73
"2019-03-01T00:00:00.000Z"
68
"2019-04-01T00:00:00.000Z"
71
"2019-05-01T00:00:00.000Z"
"2019-06-01T00:00:00.000Z"
74
"2019-07-01T00:00:00.000Z"
74
"2019-08-01T00:00:00.000Z"
73
"2019-09-01T00:00:00.000Z"
72
"2019-10-01T00:00:00.000Z"
75
"2019-11-01T00:00:00.000Z"
"2019-12-01T00:00:00.000Z"
76
aggregate step
aggregate stepPerform aggregations on one or several columns. Available aggregation functions are sum, average, count, count distinct, min, max, first, last.
An aggreation step has the following structure:
Example 1: keepOriginalGranularity set to false
Input dataset:
Label 1
Group 1
13
10
Label 2
Group 1
7
21
Label 3
Group 1
20
4
Label 4
Group 2
1
17
Label 5
Group 2
9
12
Label 6
Group 2
5
2
Step configuration:
Output dataset:
Group 1
40
35
13.333333
Group 2
16
31
5.333333
Example 2: keepOriginalGranularity set to true
Input dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 1
Group 1
40
Label 2
Group 1
40
Label 3
Group 1
40
Label 4
Group 2
16
Label 5
Group 2
16
Label 6
Group 2
16
append step
append stepAppends to the current dataset, one or several datasets resulting from other pipelines. WeaverBird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.
Example
Input dataset:
Label 1
Group 1
13
Label 2
Group 1
7
dataset1 (saved in the application Vuex store):
Label 3
Group 1
20
Label 4
Group 2
1
dataset2 (saved in the application Vuex store):
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
argmax step
argmax stepGet row(s) matching the maximum value in a given column, by group if groups is specified.
Example 1: without groups
groupsInput dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 3
Group 1
20
Example 2: with groups
groupsInput dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 3
Group 1
20
Label 5
Group 2
10
argmin step
argmin stepGet row(s) matching the minimum value in a given column, by group if groups is specified.
Example 1: without groups
groupsInput dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 4
Group 2
1
Example 2: with groups
groupsInput dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 2
Group 1
7
Label 4
Group 2
1
comparetext step
comparetext stepCompares 2 string columns and returns true if the string values are equal, and false oteherwise. The comparison is case-sensitive (see examples below).
Example
Input dataset:
France
Fr
France
France
France
france
France
England
France
USA
Step configuration:
Output dataset:
France
Fr
false
France
France
true
France
france
false
France
England
false
France
USA
false
concatenate step
concatenate stepThis step allows to concatenate several columns using a separator.
Example
Input dataset:
Company 1
Group 1
13
Company 2
Group 1
7
Company 3
Group 1
20
Company 4
Group 2
1
Company 5
Group 2
10
Company 6
Group 2
5
Step configuration:
Output dataset:
Company 1
Group 1
13
Company 1 - Group 1
Company 2
Group 1
7
Company 2 - Group 1
Company 3
Group 1
20
Company 3 - Group 1
Company 4
Group 2
1
Company 4 - Group 2
Company 5
Group 2
10
Company 5 - Group 2
Company 6
Group 2
5
Company 6 - Group 2
convert step
convert stepThis step allows to convert columns data types.
In a effort to harmonize as much as possible the conversion behaviors, for some cases, the Sql translator implements casting otherwise than the CAST AS method.
Precisely, when casting float to integer, the default behaviour rounds the result, other languages truncate it. That's why the use of TRUNCATE was implemented when converting float to int. The same implementation was done when converting strings to int (for date represented as string). As for the conversion of date to int, we handled it by assuming the dataset's timestamp is in TIMESTAMP_NTZ format.
Example
Input dataset:
Company 1
'13'
Company 2
'7'
Company 3
'20'
Company 4
'1'
Company 5
'10'
Company 6
'5'
Step configuration:
Output dataset:
Company 1
13
Company 2
7
Company 3
20
Company 4
1
Company 5
10
Company 6
5
cumsum step
cumsum stepThis step allows to compute the cumulated sum of value columns based on a reference column (usually dates) to be sorted by ascending order for the needs of the computation. The computation can be scoped by group if needed.
The toCumSum parameter takes as input a list of 2-elements lists in the form ['valueColumn', 'newColumn'].
Example 1: Basic usage
Input dataset:
2019-01
2
2019-02
5
2019-03
3
2019-04
8
2019-05
9
2019-06
6
Step configuration:
Output dataset:
2019-01
2
2
2019-02
5
7
2019-03
3
10
2019-04
8
18
2019-05
9
27
2019-06 6
6
33
Example 2: With more advanced options
Input dataset:
France
2019-01
2
France
2019-02
5
France
2019-03
3
France
2019-04
8
France
2019-05
9
France
2019-06 6
6
USA
2019-01
10
USA
2019-02
6
USA
2019-03
6
USA
2019-04
4
USA
2019-05
8
USA
2019-06 6
7
Step configuration:
Output dataset:
France
2019-01
2
2
France
2019-02
5
7
France
2019-03
3
10
France
2019-04
8
18
France
2019-05
9
27
France
2019-06 6
6
33
USA
2019-01
10
10
USA
2019-02
6
16
USA
2019-03
6
22
USA
2019-04
4
26
USA
2019-05
8
34
USA
2019-06 6
7
41
custom step
custom stepThis step allows to define a custom query that can't be expressed using the other existing steps.
Example: using Mongo query language
Input dataset:
Company 1
Group 1
13
Company 2
Group 1
7
Company 3
Group 1
20
Company 4
Group 2
1
Company 5
Group 2
10
Company 6
Group 2
5
Step configuration:
Output dataset:
Company 1
Group 1
13
Company 1 - Group 1
Company 2
Group 1
7
Company 2 - Group 1
Company 3
Group 1
20
Company 3 - Group 1
Company 4
Group 2
1
Company 4 - Group 2
Company 5
Group 2
10
Company 5 - Group 2
Company 6
Group 2
5
Company 6 - Group 2
dateextract step
dateextract stepExtract date information (eg. day, week, year etc.). The following information can be extracted:
year: extract 'year' from date,month: extract 'month' from date,day: extract 'day of month' from date,week': extract 'week number' (ranging from 0 to 53) from date,quarter: extract 'quarter number' from date (1 for Jan-Feb-Mar)dayOfWeek: extract 'day of week' (ranging from 1 for Sunday to 7 for Staurday) from date,dayOfYear: extract 'day of year' from date,isoYear: extract 'year number' in ISO 8601 format (ranging from 1 to 53) from date.isoWeek: extract 'week number' in ISO 8601 format (ranging from 1 to 53) from date.isoDayOfWeek: extract 'day of week' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from date,firstDayOfYear: calendar date corresponding to the first day (1st of January) of the year ,firstDayOfMonth: calendar date corresponding to the first day of the month,firstDayOfWeek: calendar date corresponding to the first day of the week,firstDayOfQuarter: calendar date corresponding to the first day of the quarter,firstDayOfIsoWeek: calendar date corresponding to the first day of the week in ISO 8601 format,currentDay: calendar date of the target date,previousDay: calendar date one day before the target date,firstDayOfPreviousYear: calendar date corresponding to the first day (1st of January) of the previous year,firstDayOfPreviousMonth: calendar date corresponding to the first day of the previous month,firstDayOfPreviousWeek: calendar date corresponding to the first day of the previous week,firstDayOfPreviousQuarter: calendar date corresponding to the first day of the previous quarter,firstDayOfPreviousISOWeek: calendar date corresponding to the first day of the previous ISO week,previousYear: extract previous 'year number' from datepreviousMonth: extract previous 'month number' from datepreviousWeek: extract previous 'week number' from datepreviousQuarter: extract previous 'quarter number' from datepreviousISOWeek: extract previous 'week number' in ISO 8601 format (ranging from 1 for Monday to 7 for Sunday) from datehour: extract 'hour' from date,minutes: extract 'minutes' from date,seconds: extract 'seconds' from date,milliseconds: extract 'milliseconds' from date,
Here's an example of such a step:
Example
Input dataset:
2019-10-30T00:00:00.000Z
2019-10-15T00:00:00.000Z
2019-10-01T00:00:00.000Z
2019-09-30T00:00:00.000Z
2019-09-15T00:00:00.000Z
2019-09-01T00:00:00.000Z
Step configuration:
Output dataset:
2019-10-30T00:00:00.000Z
2019
10
30
2019-10-15T00:00:00.000Z
2019
10
15
2019-10-01T00:00:00.000Z
2019
10
1
2019-09-30T00:00:00.000Z
2020
10
30
2019-09-15T00:00:00.000Z
2020
10
15
2019-09-01T00:00:00.000Z
2020
10
1
dategranularity step
dategranularity stepExtract date information (eg. day, week, year etc.) in a column intended for aggregation. The following granularities are supported:
year: calendar date corresponding to the first day (1st of January) of the yearquarter: calendar date corresponding to the first day of the quartermonth: calendar date corresponding to the first day of the monthweek: calendar date corresponding to the first day of the week (sunday)isoWeek: calendar date corresponding to the first day of the week (monday)day: calendar date corresponding to the first hour of the day
Here's an example of such a step:
Example
Input dataset:
2019-10-30T00:00:00.000Z
2019-10-15T00:00:00.000Z
2019-10-01T00:00:00.000Z
2019-09-30T05:11:31.000Z
2019-09-15T00:00:00.000Z
2019-09-01T00:00:00.000Z
Step configuration:
Output dataset:
2019-10-01T00:00:00.000Z
2019-10-01T00:00:00.000Z
2019-10-01T00:00:00.000Z
2019-09-01T00:00:00.000Z
2019-09-01T00:00:00.000Z
2019-09-01T00:00:00.000Z
delete step
delete stepDelete a column.
Example
Input dataset:
Company 1
Group 1
13
Company 1 - Group 1
Company 2
Group 1
7
Company 2 - Group 1
Company 3
Group 1
20
Company 3 - Group 1
Company 4
Group 2
1
Company 4 - Group 2
Company 5
Group 2
10
Company 5 - Group 2
Company 6
Group 2
5
Company 6 - Group 2
Step configuration:
Output dataset:
13
Company 1 - Group 1
7
Company 2 - Group 1
20
Company 3 - Group 1
1
Company 4 - Group 2
10
Company 5 - Group 2
5
Company 6 - Group 2
dissolve step
dissolve stepGeographically dissolve data
Example without aggregations
Input dataset:
Country 1
City 1
Polygon
Country 2
City 2
Polygon
Country 2
City 3
Polygon
Country 1
City 4
Polygon
Country 2
City 5
Polygon
Country 1
City 6
Polygon
Step configuration:
Output dataset:
Country 1
MultiPolygon
Country 2
MultiPolygon
Example with aggregations
Input dataset:
Country 1
City 1
Polygon
100_000
Country 2
City 2
Polygon
50_000
Country 2
City 3
Polygon
200_000
Country 1
City 4
Polygon
30_000
Country 2
City 5
Polygon
25_000
Country 1
City 6
Polygon
10_000
Step configuration:
Output dataset:
Country 1
MultiPolygon
140_000
Country 2
MultiPolygon
275_000
domain step
domain stepThis step is meant to select a specific domain (using MongoDB terminology).
duplicate step
duplicate stepThis step is meant to duplicate a column.
Example
Input dataset:
Company 1
13
Company 2
0
Company 3
20
Step configuration:
Output dataset:
Company 1
13
Company 1
Company 2
0
Company 2
Company 3
20
Company 3
duration step
duration stepCompute the duration (in days, hours, minutes or seconds) between 2 dates in a new column.
Example 1: duration in days
Input dataset:
"2020-01-01T00:00:00.000Z"
"2020-01-31T00:00:00.000Z"
"2020-01-01T00:00:00.000Z"
"2020-12-31T00:00:00.000Z"
Step configuration:
Output dataset:
"2020-01-01T00:00:00.000Z"
"2020-01-31T00:00:00.000Z"
30
"2020-01-01T00:00:00.000Z"
"2020-12-31T00:00:00.000Z"
365
Example 2: duration in minutes
Input dataset:
"2020-01-01T14:00:00.000Z"
"2020-01-31T15:00:00.000Z"
"2020-01-01T15:00:00.000Z"
"2020-12-31T20:00:00.000Z"
Step configuration:
Output dataset:
"2020-01-01T14:00:00.000Z"
"2020-01-31T15:00:00.000Z"
60
"2020-01-01T15:00:00.000Z"
"2020-12-31T20:00:00.000Z"
300
evolution step
evolution stepUse this step if you need to compute the row-by-row evolution of a value column, based on a date column. It will output 2 columns: one for the evolution in absolute value, the other for the evolution in percentage.
You must be careful that the computation is scoped so that there are no dates duplicates (so that any date finds no more than one previous date). That means that you may need to specify "group by" columns to make any date unique inside each group. You should specify those columns in the indexColumns parameter.
Example 1: Basic configuration - evolution in absolute value
Input dataset:
2019-06
79
2019-07
81
2019-08
77
2019-09
75
2019-11
78
2019-12
88
Step configuration:
Output dataset:
2019-06
79
2019-07
81
2
2019-08
77
-4
2019-09
75
-2
2019-11
78
2019-12
88
10
Example 2: Basic configuration - evolution in percentage
Input dataset:
2019-06
79
2019-07
81
2019-08
77
2019-09
75
2019-11
78
2019-12
88
Step configuration:
Output dataset:
2019-06
79
2019-07
81
0.02531645569620253
2019-08
77
-0.04938271604938271
2019-09
75
-0.025974025974025976
2019-11
78
2019-12
88
0.1282051282051282
Example 3: Error on duplicate dates
If 'COUNTRY' is not specified as indexColumn, the computation will not be scoped by country. Then there are duplicate dates in the "DATE" columns which is prohibited and will lead to an error.
Input dataset:
2014-12
France
79
2015-12
France
81
2016-12
France
77
2017-12
France
75
2014-12
USA
74
2015-12
USA
74
2016-12
USA
73
2017-12
USA
72
Step configuration:
Output dataset:
With the mongo translator, you will get an error at row-level as shown below:
2014-12
France
79
2015-12
France
81
Error ...
2016-12
France
77
Error ...
2017-12
France
75
Error ...
2014-12
USA
74
2015-12
USA
74
Error ...
2016-12
USA
73
Error ...
2017-12
USA
72
Error ...
The pandas translator will just return an error, and you will not get any data.
Example 4: Complete configuration with index columns
Input dataset:
2014-12
France
79
2015-12
France
81
2016-12
France
77
2017-12
France
75
2019-12
France
78
2020-12
France
88
2014-12
USA
74
2015-12
USA
74
2016-12
USA
73
2017-12
USA
72
2018-11
USA
75
2020-12
USA
76
Step configuration:
Output dataset:
2014-12
France
79
2015-12
France
81
2
2016-12
France
77
-4
2017-12
France
75
-2
2019-12
France
78
2020-12
France
88
10
2014-12
USA
74
2015-12
USA
74
0
2016-12
USA
73
-1
2017-12
USA
72
-1
2018-11
USA
75
3
2020-12
USA
76
fillna step
fillna stepReplace null values by a given value in specified columns.
Example
Input dataset:
Company 1
Group 1
13
Company 2
Group 1
12
Company 3
Group 1
20
40
Company 4
Group 2
1
Company 5
Group 2
38
Company 6
Group 2
5
4
Step configuration:
Output dataset:
Company 1
Group 1
13
0
Company 2
Group 1
0
12
Company 3
Group 1
20
40
Company 4
Group 2
1
0
Company 5
Group 2
0
38
Company 6
Group 2
5
4
filter step
filter stepFilter out lines that don't match a filter definition.
operator is optional, and defaults to eq. Allowed operators are eq, ne, gt, ge, lt, le, in, nin, matches, notmatches isnull or notnull.
value can be an arbitrary value depending on the selected operator (e.g a list when used with the in operator, or null when used with the isnull operator).
matches and notmatches operators are used to test value against a regular expression.
Conditions can be grouped and nested with logical operators and and or.
Relative dates
Date values can be relative to the moment to the moment when the query is executed. This is expressed by using a RelativeDate object instead of the value, of the form:
formula step
formula stepAdd a computation based on a formula. Usually column names do not need to be escaped, unless they include whitespaces, in which case you'll need to use brackets '[]' (e.g. [myColumn]). Any string escaped with quotes (', ", ''', """) will be considered a string literal.
Supported operators
The following operators are supported by the formula step (note that a value can be a column name or a literal, such as 42 or foo).
+: Does an addition of two numeric values. See theconcatenatestep to append strings-: Does an substraction of two numeric values. See thereplacestep to remove a part of a string*: Multiplies two numeric values./: Divides a numeric value by another. Divisions by zero will returnnull.%: Returns the rest of an integer division. Divisions by zero will returnnull.
Example 1: Basic usage
Input dataset:
Label 1
10
2
3
1
Label 2
1
13
7
3
Label 3
5
20
5
2
Step configuration:
Output dataset:
Label 1
10
2
3
1
2
Label 2
1
13
7
3
-4
Label 3
5
20
5
2
1
Example 2: Column name with whitespaces
Input dataset:
Label 1
10
2
3
1
Label 2
1
13
7
3
Label 3
5
20
5
2
Step configuration:
Output dataset:
Label 1
10
2
3
1
2
Label 2
1
13
7
3
-4
Label 3
5
20
5
2
1
hierarchy step
hierarchy stepHierarchy for geographical data.
This step dissolves data for every hierarchy level, and adds a hierarchy level column containing a level (with 0 being the lowest granularity, i.e. the highest level).
Example
Input dataset:
Country 1
City 1
Polygon
100_000
Country 2
City 2
Polygon
50_000
Country 2
City 3
Polygon
200_000
Country 1
City 4
Polygon
30_000
Country 2
City 5
Polygon
25_000
Country 1
City 6
Polygon
10_000
Step configuration:
Output dataset:
Country 1
City 1
Polygon
100_000
2
Country 2
City 2
Polygon
50_000
2
Country 2
City 3
Polygon
200_000
2
Country 1
City 4
Polygon
30_000
2
Country 2
City 5
Polygon
25_000
2
Country 1
City 6
Polygon
10_000
2
Country 1
null
MultiPolygon
null
1
Country 2
null
MultiPolygon
null
1
null
null
MultiPolygon
null
0
ifthenelse step
ifthenelse stepCreates a new column, which values will depend on a condition expressed on existing columns.
The condition is expressed in the if parameter with a condition object, which is the same object expected by the condition parameter of the filter step). Conditions can be grouped and nested with logical operators and and or.
The then parameter only supports a string, that will be interpreted as a formula (cf. formula step). If you want it to be interpreted striclty as a string and not a formula, you must escape the string with quotes (e.g. '"this is a text"').
if...then...else blocks can be nested as the else parameter supports either a string that will be interpreted as a formula (cf. formula step), or a nested if if...then...else object.
Example
Input dataset:
Label 1
-2
Label 2
2
Label 3
0
Step configuration:
Output dataset:
Label 1
-2
2
Label 2
5
5
Label 3
0
zero
join step
join stepJoins a dataset to the current dataset, i.e. brings columns from the former into the latter, and matches rows based on columns correspondance. It is similar to a JOIN clause in SQL, or to a VLOOKUP in excel. The joined dataset is the result from the query of the right_pipeline.
The join type can be:
'left': will keep every row of the current dataset and fill unmatched rows with
nullvalues,'inner': will only keep rows that match rows of the joined dataset.
In the on parameter, you must specify 1 or more column couple(s) that will be compared to determine rows correspondance between the 2 datasets. The first element of a couple is for the current dataset column, and the second for the corresponding column in the right dataset to be joined. If you specify more than 1 couple, the matching rows will be those that find a correspondance between the 2 datasets for every column couple specified (logical 'AND').
Weaverbird allows you to save pipelines referenced by name in the Vuex store of the application. You can then call them by their unique names in this step.
Example 1: Left join with one column couple as on parameter
on parameterInput dataset:
Label 1
13
Label 2
7
Label 3
20
Label 4
1
Label 5
1
Label 6
1
rightDataset (saved in the application Vuex store):
Label 1
Group 1
Label 2
Group 1
Label 3
Group 2
Label 4
Group 2
Step configuration:
Output dataset:
Label 1
13
Group 1
Label 2
7
Group 1
Label 3
20
Group 2
Label 4
1
Group 2
Label 5
1
Label 6
1
Example 2: inner join with different column names in the on parameter
on parameterInput dataset:
Label 1
13
Label 2
7
Label 3
20
Label 4
1
Label 5
1
Label 6
1
rightDataset (saved in the application Vuex store):
Label 1
Group 1
Label 2
Group 1
Label 3
Group 2
Label 4
Group 2
Step configuration:
Output dataset:
Label 1
13
Label 1
Group 1
Label 2
7
Label 2
Group 1
Label 3
20
Label 3
Group 2
Label 4
1
Label 4
Group 2
fromdate step
fromdate stepConverts a date column into a string column based on a specified format.
Example
Input dataset:
Company 1
2019-10-06T00:00:00.000Z
13
Company 1
2019-10-07T00:00:00.000Z
7
Company 1
2019-10-08T00:00:00.000Z
20
Company 2
2019-10-06T00:00:00.000Z
1
Company 2
2019-10-07T00:00:00.000Z
10
Company 2
2019-10-08T00:00:00.000Z
5
Step configuration:
Output dataset:
Company 1
06/10/2019
13
Company 1
07/10/2019
7
Company 1
08/10/2019
20
Company 2
06/10/2019
1
Company 2
07/10/2019
10
Company 2
08/10/2019
5
lowercase step
lowercase step⚠️ Mongo's $toLower operator does not support accents. If you have accents you need to lowercase with Mongo, use a replacetext step after lowercase.
Converts a string column to lowercase.
Example:
Input dataset:
LABEL 1
Group 1
13
LABEL 2
Group 1
7
LABEL 3
Group 1
20
Step configuration:
Output dataset:
label 1
Group 1
13
label 2
Group 1
7
label 3
Group 1
20
movingaverage step
movingaverage stepCompute the moving average based on a value column, a reference column to sort (usually a date column) and a moving window (in number of rows i.e. data points). If needed, the computation can be performed by group of rows. The computation result is added in a new column.
Example 1: Basic usage
Input dataset:
2018-01-01
75
2018-01-02
80
2018-01-03
82
2018-01-04
83
2018-01-05
80
2018-01-06
86
2018-01-07
79
2018-01-08
76
Step configuration:
Output dataset:
2018-01-01
75
null
2018-01-02
80
77.5
2018-01-03
82
81
2018-01-04
83
82.5
2018-01-05
80
81.5
2018-01-06
86
83
2018-01-07
79
82.5
2018-01-08
76
77.5
Example 2: with groups and custom newColumnName
Input dataset:
France
2018-01-01
75
France
2018-01-02
80
France
2018-01-03
82
France
2018-01-04
83
France
2018-01-05
80
France
2018-01-06
86
USA
2018-01-01
69
USA
2018-01-02
73
USA
2018-01-03
73
USA
2018-01-04
75
USA
2018-01-05
70
USA
2018-01-06
76
Step configuration:
Output dataset:
France
2018-01-01
75
null
France
2018-01-02
80
null
France
2018-01-03
82
79
France
2018-01-04
83
81.7
France
2018-01-05
80
81.7
France
2018-01-06
86
83
USA
2018-01-01
69
null
USA
2018-01-02
73
null
USA
2018-01-03
73
71.7
USA
2018-01-04
75
73.7
USA
2018-01-05
70
72.7
USA
2018-01-06
76
73.7
percentage step
percentage stepCompute the percentage of total, i.e. for every row the value in column divided by the total as the sum of every values in column. The computation can be performed by group if specified. The result is written in a new column.
Example:
Input dataset:
Label 1
Group 1
5
Label 2
Group 1
10
Label 3
Group 1
15
Label 4
Group 2
2
Label 5
Group 2
7
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 1
Group 1
5
0.167
Label 2
Group 1
10
0.333
Label 3
Group 1
15
0.5
Label 4
Group 2
2
0.143
Label 5
Group 2
7
0.5
Label 6
Group 2
5
0.357
pivot step
pivot stepPivot rows into columns around a given index (expressed as a combination of column(s)). Values to be used as new column names are found in the column column_to_pivot. Values to populate new columns are found in the column value_column. The function used to aggregate data (when several rows are found by index group) must be among sum, avg, count, min or max.
Example:
Input dataset:
Label 1
Country1
13
Label 2
Country1
7
Label 3
Country1
20
Label 1
Country2
1
Label 2
Country2
10
Label 3
Country2
5
label 3
Country2
1
Step configuration:
Output dataset:
Label 1
13
1
Label 2
7
10
Label 3
20
6
statistics step
statistics stepCompute statistics of a column.,
Example:
Input dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
9.33333
6
8.5
rank step
rank stepThis step allows to compute a rank column based on a value column that can be sorted in ascending or descending order. The ranking can be computed by group.
There are 2 ranking methods available, that you will understand easily through those examples:
standard: input = [10, 20, 20, 20, 25, 25, 30] => ranking = [1, 2, 2, 2, 5, 5, 7]dense: input = [10, 20, 20, 20, 25, 25, 30] => ranking = [1, 2, 2, 2, 3, 3, 4]
(The dense method is basically the same as the standard method, but rank always increases by 1 at most).
Example 1: Basic usage
Input dataset:
FRANCE
15
FRANCE
5
FRANCE
10
FRANCE
20
FRANCE
10
FRANCE
15
USA
20
USA
30
USA
20
USA
25
USA
15
USA
20
Step configuration:
Output dataset:
USA
30
1
USA
25
2
FRANCE
20
3
USA
20
3
USA
20
3
USA
20
3
FRANCE
15
7
FRANCE
15
7
USA
15
7
FRANCE
10
10
FRANCE
10
10
FRANCE
5
12
Example 2: With more options
Input dataset:
FRANCE
15
FRANCE
5
FRANCE
10
FRANCE
20
FRANCE
10
FRANCE
15
USA
20
USA
30
USA
20
USA
25
USA
15
USA
20
Step configuration:
Output dataset:
FRANCE
5
1
FRANCE
10
2
FRANCE
10
2
FRANCE
15
3
FRANCE
15
3
FRANCE
20
4
USA
15
1
USA
20
2
USA
20
2
USA
20
2
USA
25
3
USA
30
4
rename step
rename stepRename one or several columns. The toRename parameter takes as input a list of 2-elements lists in the form ['oldColumnName', 'newColumnName'].
Example:
Input dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
replace step
replace stepReplace one or several values in a column.
A replace step has the following strucure:
Example
Input dataset:
Company 1
Fr
Company 2
UK
Step configuration:
Output dataset:
Company 1
France
Company 2
United Kingdom
replacetext step
replacetext stepReplace a substring in a column.
A replace-text step has the following structure:
Example
Input dataset:
Company 1
Fr is boring
Company 2
UK
Step configuration:
Output dataset:
Company 1
France is boring
Company 2
UK
rollup step
rollup stepUse this step if you need to compute aggregated data at every level of a hierarchy, specified as a series of columns from top to bottom level. The output data structure stacks the data of every level of the hierarchy, specifying for every row the label, level and parent in dedicated columns.
Aggregated rows can be computed with using either sum, average, count, count distinct, min, max, first or last.
Example 1 : Basic configuration
Input dataset:
Paris
France
Europe
2018
10
Bordeaux
France
Europe
2018
5
Barcelona
Spain
Europe
2018
8
Madrid
Spain
Europe
2018
3
Boston
USA
North America
2018
12
New-York
USA
North America
2018
21
Montreal
Canada
North America
2018
10
Ottawa
Canada
North America
2018
7
Paris
France
Europe
2019
13
Bordeaux
France
Europe
2019
8
Barcelona
Spain
Europe
2019
11
Madrid
Spain
Europe
2019
6
Boston
USA
North America
2019
15
New-York
USA
North America
2019
24
Montreal
Canada
North America
2019
10
Ottawa
Canada
North America
2019
13
Step configuration:
Output dataset:
Europe
Europe
CONTINENT
COUNTRY
64
North America
North America
CONTINENT
COUNTRY
112
France
Europe
France
COUNTRY
CITY
Europe
36
Spain
Europe
Spain
COUNTRY
CITY
Europe
28
USA
North America
USA
COUNTRY
CITY
North America
72
Canada
North America
Canada
COUNTRY
CITY
North America
40
Paris
France
Europe
Paris
CITY
France
23
Bordeaux
France
Europe
Bordeaux
CITY
France
13
Barcelona
Spain
Europe
Barcelona
CITY
Spain
19
Madrid
Spain
Europe
Madrid
CITY
Spain
9
Boston
USA
North America
Boston
CITY
USA
27
New-York
USA
North America
New-York
CITY
USA
45
Montreal
Canada
North America
Montreal
CITY
Canada
20
Ottawa
Canada
North America
Ottawa
CITY
Canada
20
Example 2 : Configuration with optional parameters
Input dataset:
Paris
France
Europe
2018
10
1
Bordeaux
France
Europe
2018
5
1
Barcelona
Spain
Europe
2018
8
1
Madrid
Spain
Europe
2018
3
1
Boston
USA
North America
2018
12
1
New-York
USA
North America
2018
21
1
Montreal
Canada
North America
2018
10
1
Ottawa
Canada
North America
2018
7
1
Paris
France
Europe
2019
13
1
Bordeaux
France
Europe
2019
8
1
Barcelona
Spain
Europe
2019
11
1
Madrid
Spain
Europe
2019
6
1
Boston
USA
North America
2019
15
1
New-York
USA
North America
2019
24
1
Montreal
Canada
North America
2019
10
1
Ottawa
Canada
North America
2019
13
1
Step configuration:
Output dataset:
North America
2018
Europe
CONTINENT
COUNTRY
26
6.5
4
North America
2018
North America
CONTINENT
COUNTRY
50
12.5
4
France
Europe
2018
France
COUNTRY
CITY
Europe
15
7.5
2
Spain
Europe
2018
Spain
COUNTRY
CITY
Europe
11
5.5
2
USA
North America
2018
USA
COUNTRY
CITY
North America
33
16.5
2
Canada
North America
2018
Canada
COUNTRY
CITY
North America
17
8.5
2
Paris
France
Europe
2018
Paris
CITY
France
10
10
1
Bordeaux
France
Europe
2018
Bordeaux
CITY
France
5
5
1
Barcelona
Spain
Europe
2018
Barcelona
CITY
Spain
8
8
1
Madrid
Spain
Europe
2018
Madrid
CITY
Spain
3
3
1
Boston
USA
North America
2018
Boston
CITY
USA
12
12
1
New-York
USA
North America
2018
New-York
CITY
USA
21
21
1
Montreal
Canada
North America
2018
Montreal
CITY
Canada
10
10
1
Ottawa
Canada
North America
2018
Ottawa
CITY
Canada
7
7
1
North America
2019
Europe
CONTINENT
COUNTRY
38
9.5
4
North America
2019
North America
CONTINENT
COUNTRY
62
15.5
4
France
Europe
2019
France
COUNTRY
CITY
Europe
21
10.5
2
Spain
Europe
2019
Spain
COUNTRY
CITY
Europe
17
8.5
2
USA
North America
2019
USA
COUNTRY
CITY
North America
39
19.5
2
Canada
North America
2019
Canada
COUNTRY
CITY
North America
23
11.5
2
Paris
France
Europe
2019
Paris
CITY
France
13
13
1
Bordeaux
France
Europe
2019
Bordeaux
CITY
France
8
8
1
Barcelona
Spain
Europe
2019
Barcelona
CITY
Spain
11
11
1
Madrid
Spain
Europe
2019
Madrid
CITY
Spain
6
6
1
Boston
USA
North America
2019
Boston
CITY
USA
15
15
1
New-York
USA
North America
2019
New-York
CITY
USA
24
24
1
Montreal
Canada
North America
2019
Montreal
CITY
Canada
10
10
1
Ottawa
Canada
North America
2019
Ottawa
CITY
Canada
13
13
1
select step
select stepSelect a column. The default is to keep every columns of the input domain. If the select is used, it will only keep selected columns in the output.
Example
Input dataset:
Company 1
Group 1
13
Company 1 - Group 1
Company 2
Group 1
7
Company 2 - Group 1
Company 3
Group 1
20
Company 3 - Group 1
Company 4
Group 2
1
Company 4 - Group 2
Company 5
Group 2
10
Company 5 - Group 2
Company 6
Group 2
5
Company 6 - Group 2
Step configuration:
Output dataset:
13
Company 1 - Group 1
7
Company 2 - Group 1
20
Company 3 - Group 1
1
Company 4 - Group 2
10
Company 5 - Group 2
5
Company 6 - Group 2
sort step
sort stepSort values in one or several columns. Order can be either 'asc' or 'desc'. When sorting on several columns, order of columns specified in columns matters.
Example
Input dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 3
Group 1
20
Label 1
Group 1
13
Label 2
Group 1
7
Label 5
Group 2
10
Label 6
Group 2
5
Label 4
Group 2
1
split step
split stepSplit a string column into several columns based on a delimiter.
Example 1
Input dataset:
Label 1 - Group 1 - France
13
Label 2 - Group 1 - Spain
7
Label 3 - Group 1 - USA
20
Label 4 - Group 2 - France
1
Label 5 - Group 2 - Spain
10
Label 6 - Group 2 - USA
5
Step configuration:
Output dataset:
Label 1
Group 1
Spain
13
Label 2
Group 1
USA
7
Label 3
Group 1
France
20
Label 4
Group 2
USA
1
Label 5
Group 2
France
10
Label 6
Group 2
Spain
5
Example 2: keeping less columns
Input dataset:
Label 1 - Group 1 - France
13
Label 2 - Group 1 - Spain
7
Label 3 - Group 1 - USA
20
Label 4 - Group 2 - France
1
Label 5 - Group 2 - Spain
10
Label 6 - Group 2 - USA
5
Step configuration:
Output dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
simplify step
simplify stepSimplifies geographical data.
When simplifying your data, every point that is closer than a specific distance to the previous one is suppressed. This step can be useful if you have a very precise shape for a country (such as one-meter precision), but want to quickly draw a map chart. In that case, you may want to simplify your data.
After simplification, no points will be closer than tolerance. The unit depends on data's projection and on its unit, but in general, it's expressed in meters for CRS projections. For more details, see the GeoPandas documentation.
Step configuration:
substring step
substring stepExtract a substring in a string column. The substring begins at index start_index (beginning at 1) and stops at end_index. You can specify negative indexes, in such a case the index search will start from the end of the string (with -1 being the last index of the string). Please refer to the examples below for illustration. Neither start_index nor end_index can be equal to 0.
Example 1: positive start_index and end_index
start_index and end_indexInput dataset:
foo
13
overflow
7
some_text
20
a_word
1
toucan
10
toco
5
Step configuration:
foo
13
foo
overflow
7
over
some_text
20
some
a_word
1
a_wo
toucan
10
touc
toco
5
toco
Example 2: start_index is positive and end_index is negative
start_index is positive and end_index is negativeInput dataset:
foo
13
overflow
7
some_text
20
a_word
1
toucan
10
toco
5
Step configuration:
Output dataset:
foo
13
o
overflow
7
verflo
some_text
20
ome_tex
a_word
1
_wor
toucan
10
ouca
toco
5
oc
Example 3: start_index and end_index are negative
start_index and end_index are negativeInput dataset:
foo
13
overflow
7
some_text
20
a_word
1
toucan
10
toco
5
Step configuration:
Output dataset:
foo
13
foo
overflow
7
low
some_text
20
ext
a_word
1
ord
toucan
10
can
toco
5
oco
text step
text stepUse this step to add a text column where every value will be equal to the specified text.
Example
Input dataset:
Label 1
10
Label 2
1
Label 3
5
Step configuration:
Output dataset:
Label 1
10
Sales
Label 2
1
Sales
Label 3
5
Sales
todate step
todate stepConverts a string column into a date column based on a specified format.
Example
Input dataset:
Company 1
06/10/2019
13
Company 1
07/10/2019
7
Company 1
08/10/2019
20
Company 2
06/10/2019
1
Company 2
07/10/2019
10
Company 2
08/10/2019
5
Step configuration:
Output dataset:
Company 1
2019-10-06T00:00:00.000Z
13
Company 1
2019-10-07T00:00:00.000Z
7
Company 1
2019-10-08T00:00:00.000Z
20
Company 2
2019-10-06T00:00:00.000Z
1
Company 2
2019-10-07T00:00:00.000Z
10
Company 2
2019-10-08T00:00:00.000Z
5
top step
top stepReturn top N rows by group if groups is specified, else over full dataset.
Example 1: top without groups, ascending order
groups, ascending orderInput dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 4
Group 2
1
Label 6
Group 2
5
Label 2
Group 1
7
Example 2: top with groups, descending order
groups, descending orderInput dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 4
Group 2
1
Label 5
Group 2
10
Label 6
Group 2
5
Step configuration:
Output dataset:
Label 3
Group 1
20
Label 5
Group 2
10
totals step
totals stepAppend "total" rows to the dataset for specified dimensions. Computed rows result from an aggregation (either sum, average, count, count distinct, min, max, first or last)
Example 1: basic usage
Input dataset:
France
product A
2019
5
USA
product A
2019
10
France
product B
2019
10
USA
product B
2019
15
France
product A
2020
20
USA
product A
2020
20
France
product B
2020
30
USA
product B
2020
25
Step configuration:
Output dataset:
France
product A
2019
5
USA
product A
2019
10
France
product B
2019
10
USA
product B
2019
15
France
product A
2020
20
USA
product A
2020
20
France
product B
2020
30
USA
product B
2020
25
All countries
null
null
135
Example 2: With several totals and groups
Input dataset:
France
product A
2019
5
50
USA
product A
2019
10
100
France
product B
2019
10
100
USA
product B
2019
15
150
France
product A
2020
20
200
USA
product A
2020
20
200
France
product B
2020
30
300
USA
product B
2020
25
250
Step configuration:
Output dataset:
France
product A
2019
50
5
5
USA
product A
2019
100
10
10
France
product B
2019
100
10
10
USA
product B
2019
150
15
15
France
product A
2020
200
20
20
USA
product A
2020
200
20
20
France
product B
2020
300
30
30
USA
product B
2020
250
25
25
USA
All products
2020
450
45
22.5
France
All products
2020
500
50
25
USA
All products
2019
250
25
12.5
France
All products
2019
150
15
7.5
All countries
product B
2020
550
55
27.5
All countries
product A
2020
400
40
20
All countries
product B
2019
250
25
12.5
All countries
product A
2019
150
15
7.5
All countries
All products
2020
950
95
23.75
All countries
All products
2019
400
40
10
trim step
trim stepTrim spaces in a column.
Example
Input dataset:
' Company 1 '
Group 1
13
Company 1 - Group 1
' Company 2 '
Group 1
7
Company 2 - Group 1
Step configuration:
Output dataset:
'Company 1'
Group 1
13
Company 1 - Group 1
'Company 2'
Group 1
7
Company 2 - Group 1
unpivot step
unpivot stepUnpivot a list of columns to rows.
Example 1: with dropnaparameter to true
dropnaparameter to trueInput dataset:
Company 1
France
7
10
Company 2
France
2
Company 1
USA
12
6
Company 2
USA
1
3
Step configuration:
Output dataset:
Company 1
France
NB_CLIENTS
7
Company 1
France
REVENUES
10
Company 2
France
NB_CLIENTS
2
Company 1
USA
NB_CLIENTS
12
Company 1
USA
REVENUES
6
Company 2
USA
NB_CLIENTS
1
Company 2
USA
REVENUES
3
Example 1: with dropnaparameter to false
dropnaparameter to falseInput dataset:
Company 1
France
7
10
Company 2
France
2
Company 1
USA
12
6
Company 2
USA
1
3
Step configuration:
Output dataset:
Company 1
France
NB_CLIENTS
7
Company 1
France
REVENUES
10
Company 2
France
NB_CLIENTS
2
Company 2
France
REVENUES
Company 1
USA
NB_CLIENTS
12
Company 1
USA
REVENUES
6
Company 2
USA
NB_CLIENTS
1
Company 2
USA
REVENUES
3
uppercase step
uppercase step⚠️ Mongo's $toUpper operator does not support accents. If you have accents you need to uppercase with Mongo, use a replacetext step after uppercase.
Converts a string column to uppercase.
Example:
Input dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Step configuration:
Output dataset:
LABEL 1
Group 1
13
LABEL 2
Group 1
7
LABEL 3
Group 1
20
uniquegroups step
uniquegroups stepAllow to get unique groups of values from one or several columns.
Example:
Input dataset:
Label 1
Group 1
13
Label 2
Group 1
7
Label 3
Group 1
20
Label 1
Group 2
1
Label 2
Group 1
2
Label 3
Group 1
3
Step configuration:
Output dataset:
Label 1
Group 1
Label 1
Group 2
Label 2
Group 1
Label 3
Group 1
waterfall step
waterfall stepThis step allows to generate a data structure useful to build waterfall charts. It breaks down the variation between two values (usually between two dates) accross entities. Entities are found in the labelsColumn, and can optionally be regrouped under common parents found in the parentsColumn for drill-down purposes.
Example 1: Basic usage
Input dataset:
Bordeaux
2019
135
Boston
2019
275
New-York
2019
115
Paris
2019
450
Bordeaux
2018
98
Boston
2018
245
New-York
2018
103
Paris
2018
385
Step configuration:
Output dataset:
2018
null
831
Paris
parent
65
Bordeaux
parent
37
Boston
parent
30
New-York
parent
12
2019
null
975
Example 2: With more options
Input dataset:
Bordeaux
France
product1
2019
65
Bordeaux
France
product2
2019
70
Paris
France
product1
2019
210
Paris
France
product2
2019
240
Boston
USA
product1
2019
130
Boston
USA
product2
2019
145
New-York
USA
product1
2019
55
New-York
USA
product2
2019
60
Bordeaux
France
product1
2018
38
Bordeaux
France
product2
2018
60
Paris
France
product1
2018
175
Paris
France
product2
2018
210
Boston
USA
product1
2018
95
Boston
USA
product2
2018
150
New-York
USA
product1
2018
50
New-York
USA
product2
2018
53
Step configuration:
Output dataset:
2018
2018
null
product1
358
2018
2018
null
product2
473
Bordeaux
France
child
product1
27
Bordeaux
France
child
product2
10
Boston
USA
child
product1
35
Boston
USA
child
product2
-5
France
France
parent
product2
40
France
France
parent
product1
62
New-York
USA
child
product1
5
New-York
USA
child
product2
7
Paris
France
child
product1
35
Paris
France
child
product2
30
USA
USA
parent
product2
2
USA
USA
parent
product1
40
2019
2019
null
product2
515
2019
2019
null
product1
460
Last updated
Was this helpful?
