First, I think you may have a bug with how you intended to use the CyclicBarrier. Currently you are initializing it with the number of executor threads as the number of parties. You have an additional party, however; the main thread.
So I think you need to do.
As you increase the number of tasks, you increase the overhead using each task adds. This means you want to minimise the number of tasks i.e. The same as the number of cpus you have.
For some tasks using double the number of cpus can be better when the work load is not even.