APS (PDW) – Extracting Load and Query Stats

APS (PDW) – Extracting Load and Query Stats

Hi, this is a short blog post that may be useful to users of the PDW to get a full list of load statistics and query statistics. The main area to get statistics is the PDW dashboard, but in a lot of cases this is not enough. It is even worse if best practice has not been implemented and labels for queries are not used then the dashboard becomes less use than a chocolate teapot.
So in order to extract load information from the APS the following query is rather useful, note that this will only pull back information on backups, restores and loads, if you loaded data using “insert into” for example information would not show in the results.

SELECT

r.[run_id], r.[name], r.[submit_time], r.[start_time], r.[end_time], r.[total_elapsed_time], r.[operation_type],

r.[mode], r.[database_name], r.[table_name], l.[name], r.[session_id], r.[request_id], r.[status], r.[progress], case when r.[command] is null

then q.[command] else r.[command] end as [command], r.[rows_processed], r.[rows_rejected], r.[rows_inserted] from sys.pdw_loader_backup_runs r

join sys.sql_logins l on r.principal_id = l.principal_id

left outer join sys.dm_pdw_exec_requests q on r.[request_id] = q.[request_id] where r.[operation_type] = ‘LOAD’

–AND l.[name] = ‘someusername’

order by CASE UPPER(r.[status])

WHEN ‘RUNNING’ THEN 0 WHEN ‘QUEUED’ THEN 1 ELSE 2 END ASC , ISNULL(r.[submit_time], SYSDATETIME())

DESC OPTION (label = ‘Rd_DataLoads’)

Note in the preceding statement a line is commented out. This line can be used to find loads completed by a specific user. The DMV for loads sys.pdw_loader_backup_runs stores all loads over time and persists after a region restart. Again best practice should be in place where users are logging into the PDW with their own user (or windows auth if possible) NOT sa! Finally note the use of labels:

OPTION (label = ‘some comment in here’) 

Labels can then be used either in the queries you use on this page or even through the dashboard where you can see a column for labels. I would recommend your team applying some naming conventions or standards for labelling. Next query below is useful for pulling back a list of all queries that have been executed on the APS:

select q.[request_id], q.[status], q.[submit_time] as start_time, q.[end_time], q.[total_elapsed_time], q.[command], q.[error_id], q.[session_id], s.[login_name], q.[label]

from sys.dm_pdw_exec_requests q inner join sys.dm_pdw_exec_sessions s on s.[session_id] = q.[session_id]

where LEFT(s.client_id, 9) <> ‘127.0.0.1’

order by [start_time] desc OPTION (label = ‘Rd_Query_history’)

This will give you a list of all queries performed by all users of the APS, however the DMV used: sys.dm_pdw_exec_requests only stores up to 10,000 rows so depending on the usage of the APS the query above will only give you a very recent snapshot of query performance. My recommendation for both of the above queries would be to set up SQL Agent job on the loading server to extract these stats, from the PDW, into a dedicated stats database on the loading server. You could then use SSRS or any other tool to do some proactive monitoring of large/long loads and queries for example. At worst you have a nice log of data over time should you start to get feedback about degrading performance for example.