On Mon, Jun 19, 2017 at 9:14 AM, Shekhar Bansal <shek...@yahoo.com>
Thanks a lot Kishor.
I think I can treat HDFS directory as resource and mode of filename's hash
as tasks, is there any better way of doing it in Helix?
On Monday, June 19, 2017 8:15 PM, kishore g <g.ki...@gmail.com> wrote:
1. Currently, Helix ensures even distribution of partitions within a
resource, not across resources. Is it possible for you to add tasks as part
of the same resource?
2. &3 Yes, you can start the controller as part of your process. But
since you said you launch this on Kubernetes every 5 minutes, I suggest
keeping controller and zookeeper running all the time. Controllers are
light weight and you can get away with a very an entry level container
spec. It's ok to launch Helix Participants every 5 minutes.
On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <shek...@yahoo.com>
I have a standalone java app(containerised), it reads data from HDFS, does
some transformations and write data to remote storage. I want to make it
scalable by launching multiple instances of this java app. My problem is
how to assign tasks among these instances. can helix solve this problem?
If yes, can you please help me with following
1. I referred helix quickstart example and created 1 resource per file
but node1 got assigned master for all resources, is it because of simple
StateModelDefinition used in quickstart example or I am using it wrong way
or is it some limitation of helix
2. I want to avoid running a separate controller process, so If I run
start controller as part of setup will helix be able to elect master
controller (in standalone mode), is it advisable to run tens of controllers
in distributed mode.
3. I schedule my app every five minutes using kubernetes cron, is it
advisable to use helix for such short lived processes