AUTO_BUCKETedit

Creates human-friendly buckets and returns a datetime value for each row that corresponds to the resulting bucket the row falls into. Combine AUTO_BUCKET with STATS ... BY to create a date histogram.

You provide a target number of buckets, a start date, and an end date, and it picks an appropriate bucket size to generate the target number of buckets or fewer. For example, this asks for at most 20 buckets over a whole year, which picks monthly buckets:

ROW date=TO_DATETIME("1985-07-09T00:00:00.000Z")
| EVAL bucket=AUTO_BUCKET(date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
date:datetime bucket:datetime

1985-07-09T00:00:00.000Z

1985-07-01T00:00:00.000Z

The goal isn’t to provide exactly the target number of buckets, it’s to pick a range that people are comfortable with that provides at most the target number of buckets.

If you ask for more buckets then AUTO_BUCKET can pick a smaller range. For example, asking for at most 100 buckets in a year will get you week long buckets:

ROW date=TO_DATETIME("1985-07-09T00:00:00.000Z")
| EVAL bucket=AUTO_BUCKET(date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
date:datetime bucket:datetime

1985-07-09T00:00:00.000Z

1985-07-08T00:00:00.000Z

AUTO_BUCKET does not filter any rows. It only uses the provided time range to pick a good bucket size. For rows with a date outside of the range, it returns a datetime that corresponds to a bucket outside the range. Combine AUTO_BUCKET with WHERE to filter rows.

A more complete example might look like:

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| EVAL bucket = AUTO_BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| STATS AVG(salary) BY bucket
| SORT bucket
AVG(salary):double bucket:date

46305.0

1985-02-01T00:00:00.000Z

44817.0

1985-05-01T00:00:00.000Z

62405.0

1985-07-01T00:00:00.000Z

49095.0

1985-09-01T00:00:00.000Z

51532.0

1985-10-01T00:00:00.000Z

54539.75

1985-11-01T00:00:00.000Z

AUTO_BUCKET does not create buckets that don’t match any documents. That’s why the example above is missing 1985-03-01 and other dates.

Numeric fieldsedit

auto_bucket can also operate on numeric fields like this:

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| EVAL bs = auto_bucket(salary, 20, 25324, 74999)
| SORT hire_date, salary
| KEEP hire_date, salary, bs
hire_date:date salary:integer bs:double

1985-02-18T00:00:00.000Z

66174

65000.0

1985-02-24T00:00:00.000Z

26436

25000.0

1985-05-13T00:00:00.000Z

44817

40000.0

1985-07-09T00:00:00.000Z

62405

60000.0

1985-09-17T00:00:00.000Z

49095

45000.0

1985-10-14T00:00:00.000Z

54329

50000.0

1985-10-20T00:00:00.000Z

48735

45000.0

1985-11-19T00:00:00.000Z

52833

50000.0

1985-11-20T00:00:00.000Z

33956

30000.0

1985-11-20T00:00:00.000Z

74999

70000.0

1985-11-21T00:00:00.000Z

56371

55000.0

Unlike the example above where you are intentionally filtering on a date range, you rarely want to filter on a numeric range. So you have find the min and max separately. We don’t yet have an easy way to do that automatically. Improvements coming!