《Is GPT-4 a Good Data Analyst》论文分析

张晓龙 / 2023-06-09

源于我们最近探索 ChatGPT 在数据分析领域应用，5 月底我们做了 demo，并且初步应用，详见：ChatGPT 在线教育业务下数据分析领域的初步应用真实案例

完成论文分析，和我之前做的业务上的真实案例分析对比：

当前的结论和论文一致，，在整个分析过程中gpt能够自动化很多环节，但是在业务理解和数据洞察上，在业务上真正应用还需要继续优化
论文是在相对确定的数据集和问题上展开的，没有过多考虑业务上的复杂度
具体业务的逻辑和复杂度觉得一个分析主题的难易程度，不确定性更多
我们的 demo 是在 gpt3.5版本上的验证，相较于 gpt4存在一定能力上不足，需要进一步优化 demo
在替换分析师的思考上，我的理解是未来会替掉不会用 gpt 的分析师

Paper key Takeways

Result

Experimental results show that GPT-4 can achieve comparable performance to humans.
The main and typical job scopes for a data analyst include extracting relevant data from several databases based on business partners’ requirements, presenting data visualization in an easily understandable way,and also providing data analysis and insights for the audience.
The results and analysis show that GPT-4 can achieve comparable performance to humans, but further studies are needed before concluding that GPT-4 can replace data analysts.

Experimental results show that GPT-4 can beat an entry level data analyst in terms of performance and have comparable performance to a senior level data analyst. In terms of cost and time, GPT-4 is much cheaper and faster than hiring a data analyst. but further studies are needed before concluding that GPT-4 can replace data analysts.

Task and DateSet

task and datasets:

text-to-sql:we adopt the NL2VIS task which has one more step forward than the text-to-sql task.
chat summarization: we aim to generate the data analysis in the form of bullet points instead of a short paragraph.
Task Setting：as illustrated in Figure 1, given a business-related question (q) and one or more relevant database tables (d) and their schema (s), we aim to extract the required data (D), generate a graph (G) for visualization and provide some analysis and insights (A).

Framework

Basically, there are three steps involved:

(1) code generation (shown in blue arrows)

(2) code execution (shown in orange arrows)

and (3) analysis generation (shown in green arrows)

Algorithm 1 GPT-4 as a data analyst

prompt

1、Prompt for the first step in our framework:

Write python code to select relevant data and draw the chart. Please save the plot to “figure.pdf” and save the label and value shown in the graph to “data.txt”.

An example code snippet generated by GPT-4 is shown

2、离线的获取数据，避免数据外泄

3、Prompt for the third step in our framework:

Generate analysis and insights about the data in 5 bullet points.

Experiments

Randomly choose 100 questions from different domains with different chart types and different difficulty levels to conduct our main experiments. The chart types cover the bar chart, the stacked bar chart, the line chart, the scatter chart, the grouping scatter chart and the pie chart.

The difficulty levels include: easy, medium, hard and extra hard.

The domains include: sports, artists, transportation, apartment rentals, colleges

Evaluation

Figure evaluation metrics

information correctness: is the data and information shown in the figure correct?
chart type correctness: does the chart type match the requirement in the question?
aesthetics: is the figure aesthetic and clear without any format errors?

Analysis evaluation metrics

correctness: does the analysis contain wrong data or information?
alignment: does the analysis align with the question?
complexity: how complex and in-depth is the analysis?
fluency: is the generated analysis fluent, grammatically sound and without unnecessary repetitions?

评估的结果 chatGPT vs sernior data analysts human

GPT-4’s performance is comparable to human data analysts, while the superiority varies among different metrics and human data analysts.
VS 6 years’ data analysis working experience in finance industry :We can see from the table that GPT-4 performance is comparable to the expert data analyst on most of the metrics.
VS works in internet industry as a data analyst for over 5 years :the results shows larger variance between human and AI data analysts. The human data analyst surpasses GPT-4 on information correctness and aesthetics of figures, correctness and complexity of insights, indicating that GPT-4 still still has potential for improvement.
VS a junior data analyst who has data analysis working experience in a consulting firm within 2 years : GPT-4 not only performs better on the correctness of figures and analysis, but also tends to generate more complex analysis than the human data analyst.

case study

example for gpt4 and data analysts

不足的地方：

we suspect that GPT-4’s calculation ability is not strong, especially for those complex calculation.
We also notice this issue in several other cases. Although GPT-4 generates the analysis bullet point in a very confident tone, but the calculation is sometimes inaccurate.
While GPT-4 usually only focuses on the extracted data itself, the human is easily linked with one’s background knowledge.
First, as illustrated in the case study section, GPT-4 still has hallucination problems, which is also mentioned in GPT-4 technical report (OpenAI, 2023).
Second, before providing insightful suggestions, a professional data analyst is usually confident about all the assumptions. Instead of directly giving any suggestion or making any guess from the data, GPT4 should be careful about all the assumptions and make the claims rigorous.