Creating a tool able to structure textual data and extract points of interest for domain experts still remains a challenge. The majority of integrated text analysis engines offer limited capacity that, in turn, require a lot of additional analysis [1]. A decision maker sometimes has to spend several hours looking for reasons that cause undesired trends in data, such as strong fluctuation of user satisfaction rate or absence of user feedback during long period of time. In such scenario, a domain expert should be familiar with the predictive model itself in order to figure out the set of possible reasons, apply filters on the data based on these primary conclusions and carefully inspect each textual document to finally find out something that could serve as the reasonable basis for a decision. It is also possible to manually create a dictionary of key words expressing different user moods (e.g. [should: user suggestion] etc.), create clusters based on this dictionary and analyze each cluster separately based on the specific purpose. This trade-off between accuracy and latency may be applied on relatively small dataset but not over streaming operational data as the model is to be re-defined manually for each stream.
To overcome this problem one may use embedded agents with conversational interface that allow expressing queries in natural language. Thus, in order to accelerate data analysis, it is possible to create a gateway to any reporting application or service that provides a meta layer of intelligence that can arbitrate between tabs for a given user query [2]. Such solutions have been demonstrated to be highly effective in a wide range of tasks, as for example suggesting a user to do something based on the events it has been tracking (proactive assistance) or responding to the user’s explicit spoken or typed request (reactive assistance). An outstanding example of such solutions is Power BI Question and Answer service (https://docs.microsoft.com/en-us/power-bi/power-bi-q-and-a) which is based on Microsoft’s Cortana. In many cases most of the developers are struggling to find the optimal balance between latency and accuracy. For instance, Cortana’s automation level ranges from fully-automated dialogs to human-in-the-loop, the latter allowing more complex queries to be handled by a human agent [2]. This attitude proves to be efficient when working with simple tasks like creating an alarm or, in our case, showing specific report visuals. Thus a selling point of such systems, as often claimed, is that they can enable users to get many things done via a single entry point, i.e. replacing a searching for a specific item [3].
But nevertheless when the task goes beyond the scope of the mere conversation modeling, the results produced by the technologies described above may not always match well with the user's expectations. An example of such a scenario may be summarization systems for primary care physicians [4] where accuracy and speed are the key concerns. The most common approach based on the entities and intents requires manual training of a domain specific intent and slot model. In other words, model is defined by the domain of expertise of a user, as different users may be interested in different report items.
As it has been explained above the proactive assistance consists in tracking the user activity and making suggestions based on the activity log analysis.
To achieve this goal one may apply usage metrics proposed by Microsoft Power BI (https://docs.microsoft.com/en-us/power-bi/service-usage-metrics) that helps to track how reports are being used throughout an organization, what exactly is being used, by whom and for what purpose. This metrics captures activity data by visited pages and users. Thus for each user and for each page we construct time series containing number of visits per day. Then, using the Auto Regressive Moving Average models (ARMA) we predict the number of visits for a new day and a page having the highest visit score will be chosen as the most likely candidate. Finally, the filter is obtained from the pre-defined dataset. For instance, if the "Topic Distribution" page has been selected, in filters dataset there will be a corresponding filter, like "Select all Comments having the Weight greater than zero". The generated filter is applied on the initial dataset and the result is stored in the temporary storage.
[1] S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Commun. ACM, 54(8):88-98, Aug. 2011.
[2] R. Sarikaya, P. A. Crook, A. Marin, M. Jeong, J.-P. Robichaud, A. Celikyilmaz, Y.-B. Kim, A. Rochette, O. Z. Khan, D. X. Liu, D. Boies, T. Anastasakos, Z. Feizollahi, N. Ramesh, H. Suzuki, R. Holenstein, E. Krawczyk, and V. Radostev. An overview of end-to-end language understanding and dialog management for personal digital assistants. IEEE, December 2016.
[3] R. Sarikaya. The technology powering personal digital assistants. In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, 2015.
[4] D. M. A. L. S. R. S. Margalit R. S., Roter D. Electronic medical record use and physician-patient communication: an observational study of israeli primary care encounters. Patient Education and Counseling, 1, 2006.
Comments