AUTO_BUCKET
edit
Creates human-friendly buckets and returns a datetime
value for each row that
corresponds to the resulting bucket the row falls into. Combine AUTO_BUCKET
with STATS ... BY
to create a date histogram.
You provide a target number of buckets, a start date, and an end date, and it picks an appropriate bucket size to generate the target number of buckets or fewer. For example, this asks for at most 20 buckets over a whole year, which picks monthly buckets:
ROW date=TO_DATETIME("1985-07-09T00:00:00.000Z") | EVAL bucket=AUTO_BUCKET(date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
date:datetime | bucket:datetime |
---|---|
1985-07-09T00:00:00.000Z |
1985-07-01T00:00:00.000Z |
The goal isn’t to provide exactly the target number of buckets, it’s to pick a range that people are comfortable with that provides at most the target number of buckets.
If you ask for more buckets then AUTO_BUCKET
can pick a smaller range. For example,
asking for at most 100 buckets in a year will get you week long buckets:
ROW date=TO_DATETIME("1985-07-09T00:00:00.000Z") | EVAL bucket=AUTO_BUCKET(date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
date:datetime | bucket:datetime |
---|---|
1985-07-09T00:00:00.000Z |
1985-07-08T00:00:00.000Z |
AUTO_BUCKET
does not filter any rows. It only uses the provided time range to
pick a good bucket size. For rows with a date outside of the range, it returns a
datetime
that corresponds to a bucket outside the range. Combine AUTO_BUCKET
with WHERE
to filter rows.
A more complete example might look like:
FROM employees | WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z" | EVAL bucket = AUTO_BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z") | STATS AVG(salary) BY bucket | SORT bucket
AVG(salary):double | bucket:date |
---|---|
46305.0 |
1985-02-01T00:00:00.000Z |
44817.0 |
1985-05-01T00:00:00.000Z |
62405.0 |
1985-07-01T00:00:00.000Z |
49095.0 |
1985-09-01T00:00:00.000Z |
51532.0 |
1985-10-01T00:00:00.000Z |
54539.75 |
1985-11-01T00:00:00.000Z |
AUTO_BUCKET
does not create buckets that don’t match any documents. That’s
why the example above is missing 1985-03-01
and other dates.
Numeric fieldsedit
auto_bucket
can also operate on numeric fields like this:
FROM employees | WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z" | EVAL bs = auto_bucket(salary, 20, 25324, 74999) | SORT hire_date, salary | KEEP hire_date, salary, bs
hire_date:date | salary:integer | bs:double |
---|---|---|
1985-02-18T00:00:00.000Z |
66174 |
65000.0 |
1985-02-24T00:00:00.000Z |
26436 |
25000.0 |
1985-05-13T00:00:00.000Z |
44817 |
40000.0 |
1985-07-09T00:00:00.000Z |
62405 |
60000.0 |
1985-09-17T00:00:00.000Z |
49095 |
45000.0 |
1985-10-14T00:00:00.000Z |
54329 |
50000.0 |
1985-10-20T00:00:00.000Z |
48735 |
45000.0 |
1985-11-19T00:00:00.000Z |
52833 |
50000.0 |
1985-11-20T00:00:00.000Z |
33956 |
30000.0 |
1985-11-20T00:00:00.000Z |
74999 |
70000.0 |
1985-11-21T00:00:00.000Z |
56371 |
55000.0 |
Unlike the example above where you are intentionally filtering on a date range,
you rarely want to filter on a numeric range. So you have find the min
and max
separately. We don’t yet have an easy way to do that automatically. Improvements
coming!