Hi,
One of my customer's Virtual Center 6.0 environment, managing approximately 1200 virtual machines and 36 hosts, periodically has events in Virtual Center console (Web or Legacy client) as well as via Web Services cease to update and appear. Only previous events display, and no new events appear. Tasks sometimes appear or are significantly delayed. Restarting vpx temporarily alleviates this.
Last Entry in virtual center console is as it shows below:
vpxd log shows various entries like these:
2016-01-07T17:19:46.983-08:00 warning vpxd[33020] [Originator@6876 sub=Default opID=HB-host-50349@41380-1cb697c3] [VdbStatement] SQL execution took too long: UPDATE VPX_VM SET SUSPEND_TIME = ? , BOOT_TIME = ? , SUSPEND_INTERVAL = ? , QUESTION_INFO = ? , MEMORY_OVERHEAD = ? , TOOLS_MOUNTED = ? , MKS_CONNECTIONS = ? , ONLINE_STANDBY = ? , FAULT_TOLERANCE_STATE = ? , RECORD_REPLAY_STATE = ? , IS_CONSOLIDATE_NEEDED = ? , IS_QUIESCED_FORK_PARENT = ? , OFFLINE_FEATURE_REQUIREMENT = ? , FEATURE_REQUIREMENT = ? , FEATURE_MASK = ? , PAUSED = ? , SNAPSHOT_IN_BACKGROUND = ? WHERE ID = ?
The server hosting Virtual center is Windows 2012 R2, and its a physical server with 80 GB of memory (40GB consumed). CPU utilization is not high. Disk space is ok.
We opened a ticket with VMWare support, and identified approximately ~90 million rows in the VPXADMIN.VPX_EVENT_ARG table, and about the same number of rows in VPXADMIN.VPX_EVENT table. Each table are about 64GB in size. Database is Oracle hosted on Exadata. DBA has already looked at the database and no bottleneck is evident on the database side. The stored procedure which prunes these tables based on the VCenter retention policy has been running successfully every 6 hours. Current retention policy is 30 days. VMware support suggested reducing the retention history down to 7 days.
The customer would like to retain the current retention policy, therefore we are looking for any means to fine tune Virtual Center server to handle this type of event volume. Also, looking at the data held in these two tables, 95% of them are events related to triggered cpu/memory alarms. Are there any known supported mechanisms to prune these alerts specifically in the above tables, without affecting the rest of the event history?
Thanks for any suggestion.
Added log and screenshots.