atom feed2 messages in org.apache.spark.issues[jira] [Comment Edited] (SPARK-21641)...
FromSent OnAttachments
kant kodali (JIRA)Oct 10, 2017 9:59 pm 
kant kodali (JIRA)Oct 10, 2017 9:59 pm 
Subject:[jira] [Comment Edited] (SPARK-21641) Combining windowing (groupBy) and mapGroupsWithState (groupByKey) in Spark Structured Streaming
From:kant kodali (JIRA) (ji@apache.org)
Date:Oct 10, 2017 9:59:00 pm
List:org.apache.spark.issues

[
https://issues.apache.org/jira/browse/SPARK-21641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199811#comment-16199811
]

kant kodali edited comment on SPARK-21641 at 10/11/17 4:58 AM:

---------------------------------------------------------------

@[~marmbrus] When can we possible expect this?

was (Author: kant): [~marmbrus]

Combining windowing (groupBy) and mapGroupsWithState (groupByKey) in Spark
Structured Streaming

-----------------------------------------------------------------------------------------------

Key: SPARK-21641 URL: https://issues.apache.org/jira/browse/SPARK-21641 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.2.0 Reporter: Tudor Miu

Given a stream of timestamped data with watermarking, there seems to be no way
to combine (1) the {{groupBy}} operation to achieve windowing by the timestamp
field and other grouping criteria with (2) the {{groupByKey}} operation in order
to apply {{mapGroupsWithState }}to the groups for custom sessionization. For context: - calling {{groupBy}}, which supports windowing, on a Dataset returns a
{{RelationalGroupedDataset}} which does not have {{mapGroupsWithState}}. - calling {{groupByKey}}, which supports {{mapGroupsWithState}}, returns a
{{KeyValueGroupedDataset}}, but that has no support for windowing. The suggestion is to _somehow_ unify the two APIs.