Performance – Magica Soft

Introduction

This section discusses various performance-related issues.
The content is primarily intended for programmers.

CPU dependent

MagicaCloth works with Unity DOTS (Data-Oriented Technology Stack).
Therefore, simulation performance is completely dependent on the CPU.
On the contrary, the GPU is not used at all.

Additionally, DOTS supports multithreading, so the more cores (threads) the CPU has, the better the performance will be because it can be executed in parallel.

Performance on mobile devices

However, a little caution must be exercised when using it on Android/iPhone.
CPUs in mobile devices are generally formed with a big core configuration and a small low-power core configuration.
This is called the Big-Little configuration.
For example, even if a terminal has an 8-core CPU, in most cases it is divided into Big4/Little4 and so on.
In this case, it is described as (4-4 cores).
Unity will only run DOTS on the Big cores.
Therefore, in the case of the above device, only 4 of the 8 cores can be used for DOTS.
Please note this point.

This problem does not occur with desktop PC CPUs.

Check with profiler

You can easily check the simulation load using Unity’s profiler function.
In the profiler, it is displayed on the timeline as a MagicaManager block.
You can also check the multithreading status in the Job item.

Creation and execution of cloth data

MagicaCloth requires a variety of data to perform simulations.
This is called cloth data.
Cloth data is then generated on-the-fly at runtime as requested.

The creation of this cloth data requires a considerable amount of computational processing and usually takes 20ms to 100ms.
This creation process is executed in a background thread, so it has little effect on the main thread.
In addition, multiple cloths are created in multiple threads, which are executed in parallel.

However, the simulation must wait until this cloth data is completed.
This causes a delay of several frames between the actual creation of the character and the start of the simulation.

Using pre-builds

Pre-build functionality was introduced from v2.5.0.
In pre-construction, cloth data is created during editing and saved as an asset.
This provides the following main benefits:

Reduces load during initialization
Simulations can be started immediately

For information on pre-building, see Using pre-building.

Notes on Editor Execution

The Burst and JobSystem used by MagicaCloth are more demanding when running the editor than when building.
Therefore, please note that the contents of the profiler when the editor is run are not the same as when the build is run.
This is due to the following factors

Burst JIT Compiler

Burst is compiled at runtime (Just-In-Time Compiler) only when running in the editor.
Since this is done after play has started, the first time MagicaCloth is used, the compile time will be several hundred ms or more.
Thus, in the editor environment there is a significant delay before the first simulation starts after play.
This problem only occurs in the editor environment and not at build time.

To work around this problem, use the Enter Play Mode Options as follows.
This is located in the Editor tab of PlayerSettings.

By using Enter Play Mode, Burst will not be JIT compiled again after repeated play operations.

JobsDebugger processing load

In the editor, the JobsDebugger is constantly monitoring the job’s operation.
This causes jobs to take longer than usual to execute, and unnatural gaps occur between jobs.
If you are concerned about the load, please turn off JobsDebugger as follows.

SafeCheck processing load

Similarly, the editor environment is monitored for Burst safety.
Since this load also occurs to a certain extent, please turn off the following two checks if you are concerned.

Note that errors will no longer be reported.

Note, however, that if you turn off JobDebugger and SafeCheck as described above, Burst/Jobs errors will not be displayed.
Therefore, if you feel that MagicaCloth is not working properly, please turn all checks back ON and check for errors.

Recommended to test on build

As mentioned above, simulation performance decreases due to various monitoring when running the editor.
But release builds remove all these monitors.
Therefore, it is best to build and check the actual performance on the actual device.

Notes on build

Burst AOT Settings

Don’t forget to enable Burst when building.
This is done from Burst AOT Settings in PlayerSettings.
Please note that if you uncheck it, the build will be built with Burst disabled.
It is normally enabled by default.

IL2CPP recommended

We also strongly recommend using IL2CPP when building.
This is because the processing speed of C# is greatly improved compared to Mono.

List of processing loads

This section describes the most processing-intensive of MagicaCloth’s functions.
The more ★, the higher the load.

Cloth data construction method

Runtime build (default)	★★★	Runtime construction creates cloth data on the spot when it is used. This increases the load during initialization. Cloth data is created in the background, but this process also consumes CPU.
Pre-build	★	In pre-construction, cloth data is created and made into an asset during editing. This greatly reduces the initialization load. Also, there is no background processing.

Cloth type

MeshCloth

★★★★

MeshCloth is considerably more demanding than BoneCloth because it involves proxy mesh skinning and writing back to the render mesh in addition to simulation.

Therefore, please pay attention to performance when using mobile devices.

BoneCloth

★

BoneCloth is very lightweight.
In most cases, it can be used in large quantities without causing problems.

Collision processing

Self Collision	★★★★★★★★★★	Self-collision is a prominent and demanding process among all functions. Therefore, it is basically intended for use on desktop PCs with a large number of CPU cores. If used on a mobile device, reduce the number of vertices in the proxy mesh as much as possible and pay close attention to performance.
Mutual collision	★★★★★★★★	Mutual collision is slightly less demanding than self-collision because it only determines collision with the other party. However, since the process is no different from self-collision, please pay close attention to performance here as well.
Edge Collision	★★★★	Edge collisions are several times more demanding than point collisions. Try to use this only when there is a problem with point collision.
Point Collision	★★	Point collisions have a far lower processing load than other collision determinations.
Backstop	★	Backstop has the lowest processing load because it requires only a few calculations. It can be used without concern.

Simulation frequency and maximum number of updates

MagicaCloth’s simulation is executed at a different timing than Unity’s frame update due to its own time management.
This is shown, running at regular intervals independent of the frame rate.

This constant interval is called the simulation frequency.
For example, if the frequency is 90, the simulation will update every 1/90th of a second.
This is the same relationship between Unity’s physics engine update (FixedUpdate) and frame update.
MagicaCloth has an initial frequency of 90.
In other words, the simulation updates 90 times per second.

Also, the maximum number of simulations that can be executed in one frame is set.
This is a safety feature to prevent infinite repetitions of the simulation under excessive load.
MagicaCloth is set to 3 times as the initial value.
If the execution of the simulation is omitted due to the maximum number of times, the position is supplemented by the interpolation function.
This interpolation function is simple and not very accurate.
So keep in mind that artifacts may occur if the simulation is skipped.

frequency and performance

Simulation frequency is directly related to performance.
Lowering the frequency improves performance by running fewer simulations.
However, frequency has a significant impact on simulation accuracy.
Therefore, lowering the frequency also reduces the accuracy of the simulation.
Keep in mind that there is a trade-off between frequency and simulation accuracy.

Change frequency and maximum number of updates

The frequency and maximum number of updates can be changed in two ways.
Changes are possible at any time.

API
It can be changed from script by API.

MagicaSettings
A dedicated component called MagicaSettings is provided to change the state of the system.
By using this component, you can change the frequency and maximum number of updates without coding.
Please refer to the MagicaSettings document for the setting method.

Operating effect of frequency

Changing the frequency slightly changes the simulation behavior.
For example, even if you adjust the movement at frequency 90, if you set the frequency to 30 or 150, the movement will change and it will not be completely the same.
This is because changing the frequency causes a slight difference in the effect of the parameter.
Therefore, a change in frequency may require readjustment of parameters.

Setting Example

Here are some configuration examples.

PRIORITIZE PERFORMANCE

If performance is important try setting the frequency to 60 and max updates to 2.
Slightly less accurate but better performance.

FIXED FRAME RATE

If the game runs at a fixed frame rate such as 60fps, setting the frequency accordingly is also effective.
For example, set the frequency to 60, which is the same as fps, and limit the maximum number of updates to 1.
This stabilizes the load of one frame.

Also, if your game runs at 30fps, a frequency of 60 and a max update count of 2 will also help.
As a result, the simulation is updated twice in one frame, and the accuracy of frequency 60 can be secured even at 30fps.

PERFORMANCE FIRST

If performance is your top priority, try setting the frequency to 30 and the maximum number of updates to 1.
This allows for maximum performance.
However, be very careful that the precision is considerably reduced.
This setting favors performance over artifacts.

Culling System

Culling is a function that improves performance by stopping the simulation of characters that are not drawn to the camera or that are more than a certain distance away from the camera.

This feature greatly improves performance in first-person FPS and VR.
Culling consists of two features: camera culling and distance culling.

Please refer to the culling system documentation for details.

Character placement

DOTS also uses multithreading for reading and writing Transforms.
However, to take advantage of this benefit, you need to be careful about how you place your characters.
In DOTS, Transform processing is multithreaded for each GameObject group placed at the root of the hierarchy.

In the following example, all 10 characters are placed at the root, so the Transform process for each character will be executed in multiple threads.
This is the ideal arrangement.

However, in the following example, all characters are placed as children of the “CharacterGroup” object.
This is a very bad example and the Transform process is not multi-threaded at all.
Please note that the deterioration in performance will be particularly noticeable when there are a large number of characters.