librelist archives

« back to archive

Batch Redis Inserts

Batch Redis Inserts

From:
Jake Mack
Date:
2012-08-21 @ 20:55
I wanted to bring this up again because I think it's a very useful 
addition, especially considering the amount of work sidekiq allows to be 
processed because of its speed. There has already been a pull request 
for it that I think could be revived:

https://github.com/mperham/sidekiq/pull/264

Basically, I'd like to be able to call a function to insert our ~800k 
jobs into sidekiq in one shot (or a few, if there is some sort of max 
request size in redis that I don't see documented in the redis command) 
rather than having 800k round trip requests to the redis instance. 
Splitting up the insertions into multiple workers inserting chunks of 
jobs really only papers over the issue and is neither efficient nor 
scalable.

Thoughts?

Jake

Re: [sidekiq] Batch Redis Inserts

From:
Mike Perham
Date:
2012-08-22 @ 15:27
I took another look at that PR and saw one major issue: it sends all N
jobs through the middleware pipeline at once.  The middleware API is
one distinct job at a time.

On Tue, Aug 21, 2012 at 1:55 PM, Jake Mack <jakemack@gmail.com> wrote:
> I wanted to bring this up again because I think it's a very useful
> addition, especially considering the amount of work sidekiq allows to be
> processed because of its speed. There has already been a pull request
> for it that I think could be revived:
>
> https://github.com/mperham/sidekiq/pull/264
>
> Basically, I'd like to be able to call a function to insert our ~800k
> jobs into sidekiq in one shot (or a few, if there is some sort of max
> request size in redis that I don't see documented in the redis command)
> rather than having 800k round trip requests to the redis instance.
> Splitting up the insertions into multiple workers inserting chunks of
> jobs really only papers over the issue and is neither efficient nor
> scalable.
>
> Thoughts?
>
> Jake

Re: [sidekiq] Batch Redis Inserts

From:
Jake Mack
Date:
2012-08-23 @ 18:56
Another issue I noticed is that each job doesn't get a unique jid 
(because the PR was created before jids were implemented). That's 
relatively easy to fix, at least.

The middleware API issue is bigger though. I haven't been able to come 
up with a nice, elegant way to resolve that yet, not sure if you have 
any ideas there. They seem to be inherently incompatible.

One solution I thought up involves changing the middleware API to always 
receive an array of jobs. Most of the time, the array would have just 
one job in it, but for batches it could receive the full list. This 
seems like it would complicate writing the middleware a bit (and I 
understand changing the middleware API signature is probably not 
desirable either).

The other solution that came to mind was having the batch insert skirt 
around the middleware. It would be similar to Rails' update_all function 
which skips all of the ActiveRecord callbacks and validations and jumps 
right to the DB to do a batch update. It would skip the client side 
middleware and simply push all jobs in one large chunk. The server side 
middleware would be unaffected. This slight change in semantics could be 
made clear in the documentation.

Thoughts?

Mike Perham wrote:
>
> I took another look at that PR and saw one major issue: it sends all N
> jobs through the middleware pipeline at once. The middleware API is
> one distinct job at a time.
>
> On Tue, Aug 21, 2012 at 1:55 PM, Jake Mack<jakemack@gmail.com> wrote:
>>
>> I wanted to bring this up again because I think it's a very useful
>> addition, especially considering the amount of work sidekiq allows to be
>> processed because of its speed. There has already been a pull request
>> for it that I think could be revived:
>>
>> https://github.com/mperham/sidekiq/pull/264
>>
>> Basically, I'd like to be able to call a function to insert our ~800k
>> jobs into sidekiq in one shot (or a few, if there is some sort of max
>> request size in redis that I don't see documented in the redis command)
>> rather than having 800k round trip requests to the redis instance.
>> Splitting up the insertions into multiple workers inserting chunks of
>> jobs really only papers over the issue and is neither efficient nor
>> scalable.
>>
>> Thoughts?
>>
>> Jake

Re: [sidekiq] Batch Redis Inserts

From:
Mike Perham
Date:
2012-08-23 @ 19:23
On Thu, Aug 23, 2012 at 11:56 AM, Jake Mack <jakemack@gmail.com> wrote:
> Another issue I noticed is that each job doesn't get a unique jid (because
> the PR was created before jids were implemented). That's relatively easy to
> fix, at least.
>
> The middleware API issue is bigger though. I haven't been able to come up
> with a nice, elegant way to resolve that yet, not sure if you have any ideas
> there. They seem to be inherently incompatible.
>
> One solution I thought up involves changing the middleware API to always
> receive an array of jobs. Most of the time, the array would have just one
> job in it, but for batches it could receive the full list. This seems like
> it would complicate writing the middleware a bit (and I understand changing
> the middleware API signature is probably not desirable either).

Correct, the signature can't change for a feature this edge-casey.

> The other solution that came to mind was having the batch insert skirt
> around the middleware. It would be similar to Rails' update_all function
> which skips all of the ActiveRecord callbacks and validations and jumps
> right to the DB to do a batch update. It would skip the client side
> middleware and simply push all jobs in one large chunk. The server side
> middleware would be unaffected. This slight change in semantics could be
> made clear in the documentation.

Yeah, that's definitely a reasonable trade-off.

mike