[FIXED] java11 stream split n of list chunk as arguments

Issue

Let’s say I have this (Spring boot) code:

List<User> userList = userService.selectAll(); <-- this returns 1,000,000 rows

customerService.saveBulk(userList).get(); 

I want to split the list into small sizes and perform saveBulk iteratively.

Is there any way to perform saveBulk smartly using java stream?

saveBulk is annotated with @Async

Solution

You can use Collectors.groupingBy() for this and then iterate over it.


final List<User> users = userService.selectAll();
final int partitionSize = 10000;
final AtomicInteger counter = new AtomicInteger();

users
    .stream()
    .collect(Collectors.groupingBy(x -> counter.getAndIncrement() / partitionSize))
    .values()
    // now you have a Collection<List<User>>
    // each list contains partitionSize elments
    .stream()
    .map(group -> customerService.saveBulk(group));
    .forEach(future -> future.get())

My Java is a little rusty so you might want to return a single combined future if you want to make your method @Async also and return some Future<T> instead of calling .get() on the futures iteratively.

Depending on the number of your users it might be better to individually load chunks of users, perform your needed actions and save them again. In this solution all users are loaded into memory by userService.selectAll() at the beginning. Depending on the number of your users this might be too much data (1.000.000 rows) loaded at the same time.

I think the best approach would be to paginate userService.selectAll() and query 10.000 users, do you work, and save the 10.000 users back. For this you would need some kind of pagination of your data.

If your Backend is a database with a Hibernate ORM you can make it kind of like this:

  • Make your backing userRepository a PagingAndSortingRepository
  • Iterate over the batch sized pages to operate in batches

Then the logic would be something like:

UserRepository:
@Repository
public interface UserRepository extends PagingAndSortingRepository<User, Long> {

}


UserService:
public Page<User> findPaginated(int pageNo, int pageSize) {

    Pageable paging = PageRequest.of(pageNo, pageSize);
    Page<User> pagedResult = userRepository.findAll(paging);

    return pagedResult;
}

Your Logic:

int pageSize = 10000
int totalPages = userService
    .findPaginated(0,pageSize).getTotalPages()

// Then you can iterate over all the pages however you like
for(int i = 0; i < totalPages; i++) {
    List<User> batch = userService.findPaginated(i, pageSize)
    // do your stuff with the batch
    customerService.saveBulk(batch)
}

Answered By – Leonhard Bauer

Answer Checked By – Timothy Miller (Easybugfix Admin)

Leave a Reply

(*) Required, Your email will not be published