Hi @afattahi
Gurobi has a parameter allowing use of multiple NUMA nodes.
I think it is called GURO_PAR_PROCGROUPS 2 where 2 is the number of numa nodes.
It indeed is the limitation of the windows. But, apparently the code we intend to run can get modified to schedule the job between processor groups, which apparently is called "making the code group aware”.
This article https://bitsum.com/general/the-64-core-threshold-processor-groups-and-windows/ suggests contacting app developer to modify the app to become group aware.
As @Mstislav Simonenkov suggested, apparently Gurobi has this feature.
Hi @afattahi
Gurobi has a parameter allowing use of multiple NUMA nodes.
I think it is called GURO_PAR_PROCGROUPS 2 where 2 is the number of numa nodes.
@Mstislav Simonenkov This is great!
I searched for the GURO_PAR_PROCGROUPS parameter in Gurobi's document, but I could not find any information. Would you please provide more information regarding this parameter, or let us know how to modify it through AIMMS?
Hi @afattahi
We actively discussed the issue of NUMA nodes with Marcel Hunting. I want to thank him again.
CPLEX supports the parameter CPXPARAM_CPUmask which gives the user control over NUMA nodes.
https://www.ibm.com/support/pages/node/6435091
However as far as i’m aware AIMMS currently does not support the CPXPARAM_CPUmask parameter for CPLEX.
About Gurobi parameter unfortunately i does not know how to modify it through AIMMS. I’m using it with other software via Gurobi.par file.
Other solution is to switch OS from Windows to Linux.
@Mstislav Simonenkov Thank you for the accurate input.
I have not used the Linux version of AIMMS. I would prefer to stick to windows as much as possible.
@MarcelRoelofs Would it be possible to incorporate the support for CPLEX CPU mask parameter (https://www.ibm.com/docs/en/icos/20.1.0?topic=parameters-cpu-mask-bind-threads-cores) (and the similar Gurobi parameter) in AIMMS?
@afattahi Support for NUMA machines is on our backlog, but currently it does not have the highest priority because it seems that the performance gain of using more than 32 threads is limited. See for example this discussion:
https://or.stackexchange.com/questions/5416/gurobi-and-cplex-cannot-exploit-more-than-32-cores-of-machine
Moreover, using multiple NUMA nodes to solve one MIP problem might harm the performance because then data has to be transferred through the cross-chip interconnect which adds overhead (see: https://www.cse.wustl.edu/~angelee/cse539/papers/numa.pdf).
For NUMA awareness of CPLEX, see: https://www.ibm.com/support/pages/node/6435091
@Mstislav Simonenkov GURO_PAR_PROCGROUPS seems to be an undocumented parameter. You can use a Gurobi parameter file in AIMMS by enabling the Gurobi option ‘Read Parameter File’.
@Marcel Hunting
@afattahi
I would like to share results of some of my tests, when switching from 48 to 96 threads:
MIP models the GAP improvement ~ 0.25% with 24h timelimit.
LP models are significantly faster to optimallity ~40% (Barrier method).
@Marcel Hunting
@afattahi
I would like to share results of some of my tests, when switching from 48 to 96 threads:
MIP models the GAP improvement ~ 0.25% with 24h timelimit.
LP models are significantly faster to optimallity ~40% (Barrier method).
@Mstislav Simonenkov Very interesting results for the LP model. I assume you measured the improvements by the solving time not the total elapsed time. May I ask which CPU configuration are you using?
also, If you are using windows, would you please let us know how you distribute threads across NUMAs?
@afattahi
According to AIMMS help my suggestion is to try the following:
1) Set Grudobi Read Parameter File to Yes in the project options
2) Create txt file with desired Gurobi settings. An example is present below
Heuristics 0.1
GURO_PAR_PROCGROUPS 2
3) Save this txt file as a gurobi.prm and placeit the AIMMS project directory
Please let me know if it helped.
@Mstislav Simonenkov
I tried as AIMMS help and you instructed. Unfortunately, AIMMS does not apply the gurobi.prm parameters for my model.
my gurobi.prm file:
# Parameter settings
THREADS 128
GURU_PAR_PROCGROUPS 2
I placed the file in the AIMMS root project folder
As the AIMMS help instructed, the gurobi.prm file overrights the AIMMS settings. I set the Thread_limit parameter to 80. However, when the run is finished, I can see in the gurobi log file that only 80 threads have been used, meaning that AIMMS did not read the .prm file. Also, while running, I realize that only one NUMA node is being used.
Please let me know if I am making a mistake here.
@afattahi
Could you please try changing GURU_PAR_PROCGROUPS to GURO_PAR_PROCGROUPS.
Please let me know if it helped
@Mstislav Simonenkov ah, I made a typo in my post. I actually wrote GURO_PAR_PROCGROUPS 2. I tried both with space and tab characters between name and number. Also, I use an academic license.
Maybe @Marcel Hunting can point out what am I missing?
@afattahi Support for NUMA machines is on our backlog, but currently it does not have the highest priority because it seems that the performance gain of using more than 32 threads is limited.
@Marcel Hunting great to hear that NUMA support is on your development list. I deal with large LP problems for energy system modeling and I almost always use barrier method for it's relatively superior solve speed. My guess is that increasing the thread limit to more than 64 would increase solving speeds for LP problem. Of course this increase is not linear, but still, a good improvement. My knowledge on this issue is very limited and mainly comes from online forums.
It would be very helpful if the support for NUMA nodes becomes available at least for LP problems.
@afattahi I have no idea what is wrong with your parameter file. If one of the parameter names is incorrect then Gurobi will still read and set the other options, so Threads should have been set to 128. Did you place ‘gurobi.prm’ in the same folder as the .aimms file?
@Marcel Hunting @Mstislav Simonenkov Thank you for all your input. Now, I learned how to have a greater control on solver options. After some trial and error, I found my rookie mistake. I was saving the file with the name gurobi.prm.txt without seeing the extension.
I was able to successfully modify the PROCGROUPS parameter; but, the result was disappointing.
In my LP model (Barrier method), I faced 35% increase in solving time by moving from default (32) threads to 128 threads. Not sure why there is a decrease in performance.
@afattahi
Glad we were able to help you!
Very interesting results! I achived best solving time (net) by using half of the cores (96 of 192).