Whether state exists or not.
Get the state value if it exists, or throw NoSuchElementException.
Get the state value if it exists, or throw NoSuchElementException.
Get the current processing time as milliseconds in epoch time.
Get the current processing time as milliseconds in epoch time.
In a streaming query, this will return a constant value throughout the duration of a trigger, even if the trigger is re-executed.
Get the current event time watermark as milliseconds in epoch time.
Get the current event time watermark as milliseconds in epoch time.
In a streaming query, this can be called only when watermark is set before calling
[map/flatMap]GroupsWithState
. In a batch query, this method always returns -1.
Get the state value as a scala Option.
Whether the function has been called because the key has timed out.
Whether the function has been called because the key has timed out.
This can return true only when timeouts are enabled in [map/flatMap]GroupsWithState
.
Remove this state.
Set the timeout duration for this key as a string.
Set the timeout duration for this key as a string. For example, "1 hour", "2 days", etc.
This method has no effect when used in a batch query.
,Processing time timeout must be enabled in
[map/flatMap]GroupsWithState
for calling this method.
Set the timeout duration in ms for this key.
Set the timeout duration in ms for this key.
This method has no effect when used in a batch query.
,Processing time timeout must be enabled in
[map/flatMap]GroupsWithState
for calling this method.
Set the timeout timestamp for this key as a java.sql.Date and an additional duration as a string (e.g.
Set the timeout timestamp for this key as a java.sql.Date and an additional duration as a string (e.g. "1 hour", "2 days", etc.). The final timestamp (including the additional duration) cannot be older than the current watermark.
This method has no side effect when used in a batch query.
,Event time timeout must be enabled in
[map/flatMap]GroupsWithState
for calling this method.
Set the timeout timestamp for this key as a java.sql.Date.
Set the timeout timestamp for this key as a java.sql.Date. This timestamp cannot be older than the current watermark.
This method has no side effect when used in a batch query.
,Event time timeout must be enabled in
[map/flatMap]GroupsWithState
for calling this method.
Set the timeout timestamp for this key as milliseconds in epoch time and an additional duration as a string (e.g.
Set the timeout timestamp for this key as milliseconds in epoch time and an additional duration as a string (e.g. "1 hour", "2 days", etc.). The final timestamp (including the additional duration) cannot be older than the current watermark.
This method has no side effect when used in a batch query.
,Event time timeout must be enabled in
[map/flatMap]GroupsWithState
for calling this method.
Set the timeout timestamp for this key as milliseconds in epoch time.
Set the timeout timestamp for this key as milliseconds in epoch time. This timestamp cannot be older than the current watermark.
This method has no effect when used in a batch query.
,Event time timeout must be enabled in
[map/flatMap]GroupsWithState
for calling this method.
Update the value of the state.
:: Experimental ::
Wrapper class for interacting with per-group state data in
mapGroupsWithState
andflatMapGroupsWithState
operations onKeyValueGroupedDataset
.Detail description on
[map/flatMap]GroupsWithState
operation -------------------------------------------------------------- Both,mapGroupsWithState
andflatMapGroupsWithState
inKeyValueGroupedDataset
will invoke the user-given function on each group (defined by the grouping function inDataset.groupByKey()
) while maintaining user-defined per-group state between invocations. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger. That is, in every batch of theStreamingQuery
, the function will be invoked once for each group that has data in the trigger. Furthermore, if timeout is set, then the function will invoked on timed out groups (more detail below).The function is invoked with following parameters.
In case of a batch Dataset, there is only one invocation and state object will be empty as there is no prior state. Essentially, for batch Datasets,
[map/flatMap]GroupsWithState
is equivalent to[map/flatMap]Groups
and any updates to the state and/or timeouts have no effect.The major difference between
mapGroupsWithState
andflatMapGroupsWithState
is that the former allows the function to return one and only one record, whereas the latter allows the function to return any number of records (including no records). Furthermore, theflatMapGroupsWithState
is associated with an operation output mode, which can be eitherAppend
orUpdate
. Semantically, this defines whether the output records of one trigger is effectively replacing the previously output records (from previous triggers) or is appending to the list of previously output records. Essentially, this defines how the Result Table (refer to the semantics in the programming guide) is updated, and allows us to reason about the semantics of later operations.Important points to note about the function (both mapGroupsWithState and flatMapGroupsWithState).
GroupStateTimeout
below.Important points to note about using
GroupState
.IllegalArgumentException
.GroupState
are not thread-safe. This is to avoid memory barriers.remove()
is called, thenexists()
will returnfalse
,get()
will throwNoSuchElementException
andgetOption()
will returnNone
update(newState)
is called, thenexists()
will again returntrue
,get()
andgetOption()
will return the updated value.Important points to note about using
GroupStateTimeout
.timeout
param in[map|flatMap]GroupsWithState
, but the exact timeout duration/timestamp is configurable per group by callingsetTimeout...()
inGroupState
.GroupStateTimeout.ProcessingTimeTimeout
) or event time (i.e.GroupStateTimeout.EventTimeTimeout
).ProcessingTimeTimeout
, the timeout duration can be set by callingGroupState.setTimeoutDuration
. The timeout will occur when the clock has advanced by the set duration. Guarantees provided by this timeout with a duration of D ms are as follows:EventTimeTimeout
, the user also has to specify the the the event time watermark in the query usingDataset.withWatermark()
. With this setting, data that is older than the watermark are filtered out. The timeout can be set for a group by setting a timeout timestamp usingGroupState.setTimeoutTimestamp()
, and the timeout would occur when the watermark advances beyond the set timestamp. You can control the timeout delay by two parameters - (i) watermark delay and an additional duration beyond the timestamp in the event (which is guaranteed to be newer than watermark due to the filtering). Guarantees provided by this timeout are as follows:GroupState.hasTimedOut()
set to true.Scala example of using GroupState in
mapGroupsWithState
:Java example of using
GroupState
:User-defined type of the state to be stored for each group. Must be encodable into Spark SQL types (see
Encoder
for more details).2.2.0