Pedigree recently received a Data Warehouse Report POC pressure measurement tasks (why would a manufacturer also called POC …. A little funny), this recording problem encountered during testing of ideas and analysis of the problem.
2. Test environment architecture diagram
Hair pressure tactics: LR analog business people – the problem >> PostgreSQL cluster 3. encountered – a BI reporting systems >>
3. Issues and Analysis
PostgreSQL nodes to the cluster storage file
PostgreSQL cluster of four server is managed by a unified management node (Po presses used can not direct link), the target server to store the files on the monitor nmon screenwriters, that the use of xshell PostgreSQL jump from node to node management ( not installed ftp), using xftp management node is still open window to transfer files.
Workaround: Use the scp command
scp nmon [email protected]: nmon (on the management node performs the nmon file copy to the specified directory server user name)
scp [email protected]: baobiao1_10vu.nmon /home/admin/baobiao1_10vu_111.nmon (nmon result file to the user directory and rename the copy from the remote host the current)
Pressure measurement problems encountered in the process leading to the GC
Single transaction encountered during the load test to STW GC recovery phenomenon caused xxBI look at a map server resource consumption:
When performing the scene occurs approximately half the 9 FUll GC, the GC CPU dips, logical disk read turned over several times. After stopping the current scene, continue to re-run this scenario, xxBI server resource consumption graph:
. . . . Then look at the LR of TPS trends:
action in the report query transaction sum was not enforced. . . .
Pedigree different reports have made several attempts to have this problem, then what caused it?
The first feeling is GC, such as garbage collection Used unreasonable, such as large memory, recommended G1 Garbage Collector (specifically why G1 garbage collector after the appropriate specialized writing to you in terms of those things GC), commonly used is parNew + CMS, when FULL GC Imagine occurs, the new total size of the object’s + garbage old age is very large, which resulted in a very long time STW phenomenon.
If xxBI system is used in G1, after FULL GC occurred, reason to re-execute the scene, TPS will not be no value. After most likely GC, resulting in a cache miss, and this time we measured the pressure script uses anonymous login (IP configuration is to press to xxBI white list, access the report you do not need to log a), this function suspect temporarily disabled. Pedigree trying to make a normal landing from the user browser xxBI system query report, normal query, the system quit after viewing the Task Manager, CPU consumes about 30%, ah? Not the actual pressure measurement, which is why? This operation may be triggered speculation that the cache. Once CPU down, and then retest TPS normal curve and then test it appeared long press FULL GC, then. . . . . .
Finally, people still need xxBI vendors to troubleshoot the problem, in fact the very beginning there is this phenomenon may happen that day PostgreSQL clusters of people adjust memory-related parameters, the person in charge of the PostgreSQL cluster reduction parameter, retesting still have this problem .