Solved

How to bypass processor groups to support more than 64 logical cores

Forum|Forum|4 years ago
June 4, 2021
16 replies
1281 views

afattahi
Enthusiast

Hi AIMMSians,

I am using the AMD 3995WX processor which has 128 logical processors. I realized that a single instance of AIMMS only uses 64 logical processors as processors are grouped by windows’ limit (64 cores). To use all 128 processors, I can run two instances of AIMMS at the same time. The first instance gets affinity of group0. Then I need to modify the affinity of the second instance to use the second NUMA (group1). Still, I would like one instance of AIMMS employs all logical processors. I assume the same problem exists on servers where the number of logical cores often exceeds 64.

Is there any way to bypass windows' processor group policy (i.e. make AIMMS group aware) and distribute the job between all 128 available threads?

Note 1: I tried CPLEX and GUROBI solvers. I am not sure if this limitation is related to AIMMS or the solvers.

Note 2: I am aware of disabling Hyperthreading/SMT, but that is not desirable.

Best answer by Mstislav Simonenkov

@afattahi

According to AIMMS help my suggestion is to try the following:

1) Set Grudobi Read Parameter File to Yes in the project options

2) Create txt file with desired Gurobi settings. An example is present below

Heuristics 0.1

GURO_PAR_PROCGROUPS 2

3) Save this txt file as a gurobi.prm and placeit the AIMMS project directory

Please let me know if it helped.

MarcelRoelofs
AIMMSian
Forum|Forum|4 years ago
June 6, 2021

Hi @afattahi

This article https://www.extremetech.com/computing/310299-how-does-windows-use-multiple-cpu-cores seems to suggest that this is a Windows limitation.

Marcel Roelofs - AIMMS Product Portfolio Architect

Mstislav Simonenkov
AIMMS Champ
Forum|Forum|4 years ago
June 7, 2021

Hi @afattahi

Gurobi has a parameter allowing use of multiple NUMA nodes.

I think it is called GURO_PAR_PROCGROUPS 2 where 2 is the number of numa nodes.

Mstislav Simonenkov PhD, Head of Applied science NTS Business Service Ltd.

afattahi
Author
Enthusiast
Forum|Forum|4 years ago
June 7, 2021

Hi @afattahi

This article https://www.extremetech.com/computing/310299-how-does-windows-use-multiple-cpu-cores seems to suggest that this is a Windows limitation.

It indeed is the limitation of the windows. But, apparently the code we intend to run can get modified to schedule the job between processor groups, which apparently is called "making the code group aware”.

This article https://bitsum.com/general/the-64-core-threshold-processor-groups-and-windows/ suggests contacting app developer to modify the app to become group aware.

As @Mstislav Simonenkov suggested, apparently Gurobi has this feature.

Hi @afattahi

Gurobi has a parameter allowing use of multiple NUMA nodes.

I think it is called GURO_PAR_PROCGROUPS 2 where 2 is the number of numa nodes.

@Mstislav Simonenkov This is great!

I searched for the GURO_PAR_PROCGROUPS parameter in Gurobi's document, but I could not find any information. Would you please provide more information regarding this parameter, or let us know how to modify it through AIMMS?

Assistant professor at Utrecht University - Researcher at TNO - Developing the IESA-Opt model

Mstislav Simonenkov
AIMMS Champ
Forum|Forum|4 years ago
June 7, 2021

Hi @afattahi

We actively discussed the issue of NUMA nodes with Marcel Hunting. I want to thank him again.
CPLEX supports the parameter CPXPARAM_CPUmask which gives the user control over NUMA nodes.

https://www.ibm.com/support/pages/node/6435091

However as far as i’m aware AIMMS currently does not support the CPXPARAM_CPUmask parameter for CPLEX.

About Gurobi parameter unfortunately i does not know how to modify it through AIMMS. I’m using it with other software via Gurobi.par file.

Other solution is to switch OS from Windows to Linux.

Mstislav Simonenkov PhD, Head of Applied science NTS Business Service Ltd.

afattahi
Author
Enthusiast
Forum|Forum|4 years ago
June 7, 2021

@Mstislav Simonenkov Thank you for the accurate input.

I have not used the Linux version of AIMMS. I would prefer to stick to windows as much as possible.

@MarcelRoelofs Would it be possible to incorporate the support for CPLEX CPU mask parameter (https://www.ibm.com/docs/en/icos/20.1.0?topic=parameters-cpu-mask-bind-threads-cores) (and the similar Gurobi parameter) in AIMMS?

Assistant professor at Utrecht University - Researcher at TNO - Developing the IESA-Opt model

Marcel Hunting
AIMMSian
Forum|Forum|4 years ago
June 7, 2021

@afattahi Support for NUMA machines is on our backlog, but currently it does not have the highest priority because it seems that the performance gain of using more than 32 threads is limited. See for example this discussion:

https://or.stackexchange.com/questions/5416/gurobi-and-cplex-cannot-exploit-more-than-32-cores-of-machine

Moreover, using multiple NUMA nodes to solve one MIP problem might harm the performance because then data has to be transferred through the cross-chip interconnect which adds overhead (see: https://www.cse.wustl.edu/~angelee/cse539/papers/numa.pdf).

For NUMA awareness of CPLEX, see: https://www.ibm.com/support/pages/node/6435091

@Mstislav Simonenkov GURO_PAR_PROCGROUPS seems to be an undocumented parameter. You can use a Gurobi parameter file in AIMMS by enabling the Gurobi option ‘Read Parameter File’.

Marcel - AIMMS Optimization Specialist

Mstislav Simonenkov
AIMMS Champ
Forum|Forum|4 years ago
June 7, 2021

@Marcel Hunting
@afattahi

I would like to share results of some of my tests, when switching from 48 to 96 threads:
MIP models the GAP improvement ~ 0.25% with 24h timelimit.

LP models are significantly faster to optimallity ~40% (Barrier method).

Mstislav Simonenkov PhD, Head of Applied science NTS Business Service Ltd.

afattahi
Author
Enthusiast
Forum|Forum|4 years ago
June 7, 2021

@Marcel Hunting
@afattahi

I would like to share results of some of my tests, when switching from 48 to 96 threads:
MIP models the GAP improvement ~ 0.25% with 24h timelimit.

LP models are significantly faster to optimallity ~40% (Barrier method).

@Mstislav Simonenkov Very interesting results for the LP model. I assume you measured the improvements by the solving time not the total elapsed time. May I ask which CPU configuration are you using?

also, If you are using windows, would you please let us know how you distribute threads across NUMAs?

Assistant professor at Utrecht University - Researcher at TNO - Developing the IESA-Opt model

Mstislav Simonenkov
AIMMS Champ
Answer
Forum|Forum|4 years ago
June 7, 2021

@afattahi

According to AIMMS help my suggestion is to try the following:

1) Set Grudobi Read Parameter File to Yes in the project options

2) Create txt file with desired Gurobi settings. An example is present below

Heuristics 0.1

GURO_PAR_PROCGROUPS 2

3) Save this txt file as a gurobi.prm and placeit the AIMMS project directory

Please let me know if it helped.

Mstislav Simonenkov PhD, Head of Applied science NTS Business Service Ltd.

afattahi
Author
Enthusiast
Forum|Forum|4 years ago
June 7, 2021

@Mstislav Simonenkov

I tried as AIMMS help and you instructed. Unfortunately, AIMMS does not apply the gurobi.prm parameters for my model.

my gurobi.prm file:

# Parameter settings

THREADS 128

GURU_PAR_PROCGROUPS 2

I placed the file in the AIMMS root project folder

As the AIMMS help instructed, the gurobi.prm file overrights the AIMMS settings. I set the Thread_limit parameter to 80. However, when the run is finished, I can see in the gurobi log file that only 80 threads have been used, meaning that AIMMS did not read the .prm file. Also, while running, I realize that only one NUMA node is being used.

Please let me know if I am making a mistake here.

Assistant professor at Utrecht University - Researcher at TNO - Developing the IESA-Opt model

Mstislav Simonenkov
AIMMS Champ
Forum|Forum|4 years ago
June 7, 2021

@afattahi

Could you please try changing GURU_PAR_PROCGROUPS to GURO_PAR_PROCGROUPS.

Please let me know if it helped

Mstislav Simonenkov PhD, Head of Applied science NTS Business Service Ltd.

afattahi
Author
Enthusiast
Forum|Forum|4 years ago
June 7, 2021

@Mstislav Simonenkov ah, I made a typo in my post. I actually wrote GURO_PAR_PROCGROUPS 2. I tried both with space and tab characters between name and number. Also, I use an academic license.

Maybe @Marcel Hunting can point out what am I missing?

Assistant professor at Utrecht University - Researcher at TNO - Developing the IESA-Opt model

afattahi
Author
Enthusiast
Forum|Forum|4 years ago
June 7, 2021

@afattahi Support for NUMA machines is on our backlog, but currently it does not have the highest priority because it seems that the performance gain of using more than 32 threads is limited.

@Marcel Hunting great to hear that NUMA support is on your development list. I deal with large LP problems for energy system modeling and I almost always use barrier method for it's relatively superior solve speed. My guess is that increasing the thread limit to more than 64 would increase solving speeds for LP problem. Of course this increase is not linear, but still, a good improvement. My knowledge on this issue is very limited and mainly comes from online forums.

It would be very helpful if the support for NUMA nodes becomes available at least for LP problems.

Assistant professor at Utrecht University - Researcher at TNO - Developing the IESA-Opt model

Marcel Hunting
AIMMSian
Forum|Forum|4 years ago
June 7, 2021

@afattahi I have no idea what is wrong with your parameter file. If one of the parameter names is incorrect then Gurobi will still read and set the other options, so Threads should have been set to 128. Did you place ‘gurobi.prm’ in the same folder as the .aimms file?

Marcel - AIMMS Optimization Specialist

afattahi
Author
Enthusiast
Forum|Forum|4 years ago
June 7, 2021

@Marcel Hunting @Mstislav Simonenkov Thank you for all your input. Now, I learned how to have a greater control on solver options. After some trial and error, I found my rookie mistake. I was saving the file with the name gurobi.prm.txt without seeing the extension.

I was able to successfully modify the PROCGROUPS parameter; but, the result was disappointing.

In my LP model (Barrier method), I faced 35% increase in solving time by moving from default (32) threads to 128 threads. Not sure why there is a decrease in performance.

Assistant professor at Utrecht University - Researcher at TNO - Developing the IESA-Opt model

Mstislav Simonenkov
AIMMS Champ
Forum|Forum|4 years ago
June 8, 2021

@afattahi
Glad we were able to help you!

Very interesting results! I achived best solving time (net) by using half of the cores (96 of 192).

Mstislav Simonenkov PhD, Head of Applied science NTS Business Service Ltd.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded