语言-英语翻译(edx-datascientist 1.5-1.8)

xiaoxiao2021-02-28  66

 

1.5 Lecture: How did you become interested in DataScience?


I began my work as an undergraduate(本科) where I studied comparative literature(比较文学). And I also was very interested in math and computer science. It was only when I finished an undergraduate degree that I learned about thisfield of natural language processing where I could bring together the twointerests. In around the ’90s, the natural language processing field began exploding withdata(爆发数据). We were gathering data online. It was about this time that people began collecting corpora(语料库) that we could develop systems on, trainthem, and test them. For example, it was at this time that the linguistic(才干、权限) data consortium(合伙) was formed(语言数据联盟), and it developed a number of differentcorpora, usually from the Wall Street Journal. But for example, one of the corpora that they developed would have markings fora part of speech of each word(会为每个单词的词性标注标记). And from that(从这个角度来看), we moved to a different paradigm(范例、模式) where we developed tools that learnfrom that annotated data (注释数据)how to automatically produce thetag. So at this point in time, we began developing very accurate (精确的)tools for the basic parts of naturallanguage processing, part of speech tagging(词性标注), and parsing(解析). 所以在这个时候,我们开始为自然语言处理的基本部分,词性标注和解析开发非常准确的工具。 I personally am very interested in applications and problems,so I love it whenI talk to somebody who has a lot of language data and they’d like to be able touse it or do something with it. And we can work together to figure out how to do that.

Well in graduate school, I studied machinelearning. My adviser(导师) was a joint appointment in computer science andstatistics(计算机科学和统计学的联合任用人), and so, through studying with him, Istarted taking a lot of statistics courses. And that’s where I became excited about the intersection of machinelearning and statistics. To some people, machine learning is statistics. I’m sort of one of those people. And so, in my Ph.D., I really studied a lot of ideas in both fields, incomputer science and statistics. And in my Ph.D., this was in the late ’90s, I took a course about text analysis. And I got excited about analyzing document collections, partly because I hadintuitions about the data,because I can have intuitions about articles anddocuments that I understand, and because the data fit on my hard drive. 我对分析文档集合感到兴奋,部分原因是我对数据有直觉,因为我可以直观地了解我所理解的文章和文档,也因为数据适合我的硬盘。 So I would analyze 10,000 documents,but 10,000 documents is not that big,andeasily sits on a hard drive in 1998 or 1999. So I got involved in doing text analysis, and, in particular,we got involved indoing unsupervised(无监督) learning of text analysis, which meanstaking a big collection of text without any assumptions(假设), or any labels, or any metadata(元数据),and understanding what are the hiddenpatterns that play in those documents. 所以我参与了文本分析,特别是我们参与了无监督的文本分析学习,这意味着要收集大量的文本,没有任何假设或任何标签或任何元数据,并理解这些文档中隐藏的模式是什么。 So I got interested in that in graduate school. And then, since graduate school, did a postdoc working with natural languageprocessing at Carnegie Mellon. 然后,自从研究生毕业后,在卡内基梅隆大学做了一个自然语言处理的博士后 I was in grad school at Berkeley. And then I was a professor for a while at Princeton,and then came here in thelast two years to be part of the data science institute(研究机构),to be part of defining this new fieldof data science. 然后我在普林斯顿做了一段时间的教授,然后在过去两年来到这里,成为数据科学研究所的一员,成为定义这个数据科学新领域的一部分。

When I reflect on it(思考这个问题的时候), my journey(路程) toward the field of data science beganover 20 years ago, really, with the basic love of mathematics and moving into the field ofstatistics and biostatistics(生物科学), the necessary aspects of computing,and then the nature of the work that I do really requires team science. 当我思考这个问题时,我走向数据科学领域的道路开始于20多年前,对于数学的真正的热爱,进入统计和生物统计领域,计算的必要方面,以及我确实需要团队科学的工作性质。 And I view data science really, at its heart,an interdisciplinary(跨学科) field that brings experts(专家) from many different areas. So my research program, over the last 15 years,and in an imaging, I think,really even proceeded popular use of the term of data science. 因此,我认为,在过去的15年里,我的研究计划,以及在一个影像领域,甚至开始普遍使用数据科学这个术语。 So it’s exciting for me to see that data science has so manyapplications. So I got excited by data science when I first started really doing AI andcomputer vision. And there, the goal was to make systems that would recognize a face. And we started writing rules and saying, well if the eyes are this far apart,and if the nose is this proportion (比例)of the width of the mouth,then it’s thisperson, or it’s that person. 而且我们开始写规则,说如果眼睛相距甚远,如果鼻子是嘴巴宽度的这个比例,那么就是这个人,或者是那个人。 And these rules and these kinds of ideas only went so far, because we weretrying to think of how we, as human beings, were solving an intelligent (明智的)problem. 而这些规则和这种想法只是走到了尽头,因为我们试图想到,作为人类,我们是如何解决一个明智的问题的。 But it turns out(事实证明), it’s much better to just show the computer thousandsand millions of images of faces and let it start to figure out(弄明白) what are the valuable rules on itsown. 但事实证明,向计算机显示数以千万计的人脸图像要让它开始弄清楚自己的有价值的规则会更好。 So it’s much better to teach people by showing them and by letting them dothings. And I think it’s also much better teach machines by showing them and lettingthem explore their own hypotheses(假设), rather than writing a recipe ofintelligence(智能配方), or a recipe of face recognition for the machine.

When I first was told about the opportunity at the DataScience Institute at Columbia, I was working for a defense contractor (国防供应商)and I was doing very, very excitingtechnology. Some of it was actually in the area of data science,and I said, I’m reallybusy, I love what I do, and don’t bother (打搅)me. But then I started to think and I need to realized that this is a really uniquemoment in time. This is like being asked to be invited in to a start up at ColumbiaUniversity. How would you say no to that? And so I said yes, and here I am.

I majored(主修) in mathematics because it can easilyand I didn’t have to work very hard. I graduated, though, with a feeling like I really didn’t have a big pictureabout what math was about, because I kind of skated through(溜达). 不过,我毕业了,感觉就像我对于数学是什么没有一个大概的想法,因为我喜欢溜达。 So I applied (申请)to a master’s program (硕士学位)at Berkeley in math,thinking that Icould maybe integrate(联系) everything together. 所以我在伯克利的数学专业申请了一个硕士学位课程,认为我可以把所有的东西结合在一起。 Somewhere along the line, I took a statistics class and went, oh I can do this? And that was like the first little bit. And then, after my first year in the graduate program in statistics, I spentthe summer at Bell Laboratories. And then, right, and that was the moment, right? I sat with some of the most talented(天才的) statisticians in the world, some of themost amazing, I’m just going to call it, storytellers(讲故事的人) in the world. People who could sit with a data set and pull out the stories,pull out what wasessentially (实质的)going on there in a way that inherited(继承) from the legacy(遗产) of John Tukey, that was so astounding(惊讶的). 人们可以坐在一个数据集里,把这些故事抽出来,从那里继承了John Tukey遗留下来的东西,这真是太令人震惊了。 And then, I had the great fortune(幸运) of working there for 10 years aftergetting my Ph.D. And that’s how I started. My first experience with data science was as an undergraduate. I was exposed to a lab that was studying the simulation (模仿、仿真)of the immune (免疫)system. And at that point, I was fascinated by the possibilities of being able to gaininsights(眼光、见解) about a system such as the immune system, the systemthat is so complex, through simulation. 而在那个时候,我对能够通过模拟获得关于诸如免疫系统这个系统这样复杂系统的见解的可能性着迷。 Subsequently(随后), I went to medical school(医学院), developed a deeper understanding ofhow things function normally(事物正常发挥), and when they don’t function sonormally in the human body. 随后,我去了医学院,对事物如何正常发挥作用有了更深的认识,并且在人体中不能正常发挥作用。 And that gave me insight into how we could leverage(杠杆,利用) the data that is generated in healthcare to understand the physiologic processes that occur(发生的生理过程), and better understand the effects ofthe interventions (干预效果)that we apply in health care. 这让我深入了解如何利用健康中产生的数据了解发生的生理过程,更好地理解我们在卫生保健中应用的干预措施的效果。

Well, the fields that I work in are mainly quantitativefinance(定量金融) and risk management, as well as operations(操作、手术、作业) research, which is the science ofdecision making(决策科学).

Now, data actually plays a big role in both of thosefields.

Perhaps not as big a role in operations research as itshould have in the past, but I think people are becoming very aware, now, that data needs to inform decisions. 也许在过去的运筹学研究中不是那么重要,但我想现在人们正在意识到数据需要通知决策。 And so, you see in those fields, data science playing a bigger and biggerrole. On top of that, a lot of the methodologies(方法论), the algorithms(算法),are just of interest to me anyway. The nerds side of me (书呆子的我)enjoys the stuff. I mean, I was trained as a civil engineer(土木工程师). And when I got into research, the way that we’re traditionally taught to doresearch is hypothesis testing(假设检验). 而当我进入研究阶段时,我们传统的研究方法就是假设检验。 So you have a hypothesis, you go and gather a data set,you look at whether yourdata set proves or disproves your hypothesis, and so it goes. But as we move into the 21st century,I think the challenges that we face asengineers are only going to become more complicated. And it’s really difficult to address those challenges through hypothesistesting. 通过假设检验来解决这些挑战是非常困难的。 I mean, for example, if you want to know how a city functions, it’s difficultto say, have a hypothesis about that. And so, I became more and more interested in how data could solve some of theseproblems, or how data could answer some of these questions. And my research group started to gather very, very large data sets, and westarted to use the analysis of those data sets to tell us what was driving someof the behavior that we were seeing. And that’s how I got into data science.


1.6 Lecture: What do you predict will happen in DataScience in 5 years? Bigger questions.


These are often very hard to answer, but biggerproblems. So I would give, as one example, work that we are just starting with a youngsocial worker who came here from University of Michigan, where we’re lookingat(正在研究) whether we can detect(检查、检测) in Twitter when conversation(会话讨论) on social media indicates(表明) that there will be gang violence, andalert (警示)community workers(社区工作者) so that they can work to preventviolence. 所以我举一个例子,就是我们刚刚从一个来自密歇根大学的年轻社工开始工作,在那里我们正在研究在社交媒体上的对话表明会有团伙暴力,并警惕社区工作者,使他们能够努力防止暴力。 So there is an example where the data is streaming Twitter, and we’re lookingat new technology to be able to detect threats of gang violence and preventit. Scale will no longer be an issue. So I think that we have made great progress in scaling up(扩大) statistics and machine learningalgorithms to massive data. 所以我认为,我们在统计和机器学习算法向海量数据方面取得了很大的进步。 And we won’t be hindered (阻碍)by the size of the data that we need toanalyze in any domain,in science or in technology industry(技术行业). I think that in five to 10 years,we’ll see new kinds of data science solutionsto problems about true understanding of processes in the world, that we’ll beable to take observational data and form hypotheses about the interlocking(交叉、交错) causal mechanisms(机制、机构) that created those data.(以及形成由数据产生的因果交错机制的假设) Now these are big challenges, and so that’s why this is a five- to 10-year typeof goal. But I think we’re going to make a lot of progress on that importantproblem. I think data science will be immersed in(沉浸在,全身心投入), essentially(本质上), everything we do. Growing toward an evidence-based(证据为基础) mechanism(机制) for decision making,I think, isimportant to all sectors of business(商业部门),whether it’s health, finance. 我认为,逐渐形成以证据为基础的决策机制,对于所有业务部门,无论是健康,金融,都是重要的。 And I think the availability of data(数据的可用性) will allow us to see the benefits ofmaking our decisions based on that empirical(经验性的) evidence. 而且我认为数据的可用性将使我们能够看到基于这一经验证据做出决策的好处。 So I think it’ll continue to explode(爆发性增长). It will continue to emerge(涌现,出现) and, I think,make many advances(进步) for society(对社会来说这是一种进步). So I’m very bullish about(看好) data science. I think it’s becoming a ubiquitous(普及) Renaissance(文艺复兴) science that’s affecting almost everyfield. 我认为它正在成为一个无处不在的文艺复兴时期的科学,几乎影响到了每个领域。 Across campus on Columbia, everyone’s calling themselves data scientist, whichI think is great, because people are realizing that a lot of what’s happeningin our fields is being driven by data, new sources of measurement, new information. And so we’re all able to do, I’d say, better models, a better understanding,and also collaborate together because we have this lingua franca(通用语), which is the data on the datasides. So I see many more collaborations emerging(出现、产生) between fields that don’t otherwisespeak, because now they understand each other’s domains through thisintermediary(中介), which is the data and the data science around theirdomains. 所以我看到更多的领域之间出现了更多的合作,因为现在他们通过这个中介了解彼此的领域,这个领域就是数据和数据科学。 So aside from engaging with(产生互动) computers and robots and having themunderstand the language we talk and things like that, I think the biggestimpact (影响)that’s going to happen is going to be how we interface(交互). 因此,除了与电脑和机器人互动,让他们了解我们所说的语言以及类似的事情之外,我认为将要发生的最大的影响将是我们如何进行交互。 So this whole brain computer interface, I think, is going to reallyfundamentally(根本上的) change our lives. So this whole concept of not needing a keyboard and not needing a mouse, butthinking about, I’d like my TV to turn on, understanding real time what my houseenvironment is like and being able to interact with those intelligent computersin your house, in your workplace, without having cords(绳子) dangling(悬、吊) from you(没有任何束缚). I think that’s going to be the big exciting change. Well, one thing that’s definitely(无疑的、确定的) going to happen is, now that people arestarting to realize the commonality in the pain points across different fields (不同领域的痛点)as they try to make sense of data(理解数据), there will become a codified trainingin data science(数据科学领域的贡献者). As people start to realize, what is common about the pains of biology becomingdata driven, or media becoming data driven, or advertising, finance, or otherfields that have become data driven– particularly now in health and, say,making sense of health records– people start to realize that there are commonpractices(常见做法) and common tools. So one thing that I think we’ll see is far improved clarity of(清晰可见) what is data science and what is atraining in data science? How do we go about teaching people a set of methods that will make themcompetent(胜任、能干) and potent(有力的) in these different fields as they goout and try to make sense of the world’s data? 我们如何去教导人们一系列的方法,使他们在这些不同的领域中出类拔萃,有效力,并试图理解世界的数据呢? So training, I think, is one thing that will definitely happen in the next fiveyears. The field of data science is only going to continue to grow. We’re experiencing growth in the computational capacity(计算能力)that we have as well as the kinds ofdata that we can collect. I think that health care is going to absolutely be revolutionized(革命性) by data science. We’re at its infancy(初期), I think, in health care. In health care we’re beginning to collect the data that’s relevant for thepurposes(分析相关的数据) of analysis. And once we have that data, we’re going to be able to do amazing things. There’s a lot going on right now in economics. There’s been, if you like, a mini-revolution, where a lot of academic research(学术研究) in economics is being driven by bigdata. And what I’m getting at there is traditional economics is often moremathematical, driven by theories and so on. 而我在那里得到的是传统经济学往往是更多的数学,由理论等驱动。 But now with the advent(到来) the data science, people are nowrealizing there are enormous(大量的) data sets out there that can be used toempirically (经验的)answer and test the implications(含义) of these economic theories. And so people are starting to do that a lot more, and focus on that a lot morein economics. Oh, I think the field of data science is only going to grow over the next fiveto 10 years. I think we’re only seeing the tip of the iceberg when it comes to what datascience can do to address some of our societal problems(社会性问题) . I think we’re going to see an explosion in the number of people that get datascience training, but I think we’re also going to see the everyday person onthe street becoming data literate and starting to contribute to our data-richsocieties in ways that they’re not contributing now. I think many people now contributing by generating data. But I think as people get more and more educated and we move further andfurther into this data revolution that’s happening in our world, I thinkthey’re going to get much more sophisticated(复杂的) and start to actually use data thatthey and others generate to solve their own problems. So I think we’re moving towards data literate(有文化的) society. 所以我认为我们正在走向数据文化的社会。

1.7 Lecture: What skills does a Data Scientist need to besuccessful? So a data scientist of today needs to know the foundations(基础) of data science. This would include the theory, the mathematical foundations,statistics behindhow we do prediction and analysis of data. They would need some computer science approaches. So for example, they would need to know about machine learning. They would also need to know a bit about the kind of processing we have to dowhen we have big data algorithms for big data. Storing of big data, accessing of big data(大数据访问),parallel(平行、并行) approaches(方式) to processing. 大数据存储,大数据访问,并行处理方式 And then I think it’s really important for people to see how data science canbe applied to real-world problems. There are these three fields, computer science, statistics,and optimization(优化). And I think that a modern(现代) data scientist should be versed(熟悉) in all three fields to some degree. 我认为现代数据科学家应该在一定程度上熟悉这三个领域。 In computer science, knowing things about algorithms, about machine learning,about complexity theory is very important. In statistics, knowing about inference(推理), knowing about Bayesian statistics andprobabilistic modeling, knowing probability theory. These are very important tools. And in optimization, knowing about, well, the name says it all– knowing abouthow to optimize complicated functions with respect (重要)to their parameters(参数) is essential. 了解如何根据参数优化复杂功能至关重要。 And when you start studying data science,you see that these three fieldsintertwine(纠缠). As much computing as one can obtain I think is beneficial(有益的), and beginning just with the process ofprogramming and the logic of programming. 尽可能多的计算能够获得我认为是有益的,并且从编程的过程和编程的逻辑开始。 Those skills translate very, very naturally to data science. 这些技能非常自然地转化为数据科学 Coming from my own discipline(学科) in biostatistics(生物统计),I think statistical reasoning is alsovery important,and it’s beyond just the tools that you acquire(获得、取得) in the context of a course, it’s a wayof thinking that allows you to bring a certain element of problem solving, andjoin with many others who are key players in data science. 根据我自己在生物统计学方面的学科,我认为统计推理也是非常重要的,它不仅仅是在课程中获得的工具,它是一种思维方式,可以使问题得到解决,并与数据科学领域的重要人物一起工作。 Colombia has a great program in data science,and we’re trying to blend(混合), let’s say,background from differentdepartments and fields. 哥伦比亚有一个很好的数据科学计划,我们正在尝试混合不同的部门和领域的背景。 So you need things like computation, computer science,databases to understandhow to extract information from data,let’s say. But you also need things that come from other departments,like optimization,which comes from our industrial engineering(工业工程), operations research (运筹学)department. You definitely need a lot of statistics courses,because statistics isinterested in inverse problems(逆向问题),getting the model that generated thedata, kind of working backwards that way. Electrical engineering(电子工程) courses are also important. Information theory is another approach to, again, solving inverse problems ofunderstanding the phenomenon(现象) that led to the data. So I think it really is a multi-departmental initiative(倡议), and the best way to learn about it isto take courses that are offered by all these different scholars and fields,and we’re bringing that together, I think, really nicely at Columbia throughour data science program. So something that is really important and I think unique and special about theData Science Institute at Columbia is that it’s a university-level engagement. 因此,对于哥伦比亚大学数据科学研究所而言,真正重要的一点是我认为独一无二的特殊之处在于它是大学层面的参与。 So faculty(学院、系) from across the university were housed(安置) in the School of Engineering, becausethere’s a tremendous strength(很大的优势) in the School of Engineering, but youneed all of that domain expertisen. 专门知识;专门技术). You need medical school(医学院), you need public health,you need thelaw school. There’s a little bit of data science organically(有机) in every single school here, but thenyou need that umbrella(保护伞), and that framework, and those tools toreally understand what it means to be a data scientist. 在这里的每一所学校都有一些有机的数据科学,但是你需要这个保护伞,这个框架和那些工具来真正理解成为数据科学家的意义。 That machine learning, natural language processing,that strong statisticsbackground that you need to be able to apply it to those domains. So what you really need is a combination of faculty teaching you in domains andin the data science at large,in terms of those fundamental tools(基本工具). So there’s a hard and a soft skill set requirement in data science. So the hard skills include– for machine learning– a broad understanding ofmachine learning so that you know the right tool for the right job, a deepunderstanding of machine learning so that you know how to extend each one ofthose machine learning methods to apply to a particular data set. There’s an additional hard skill set needed in computation, like data analysisis done on a computer. You do not do data analysis with pencil and paper entirely. At some point somebody gives you the data, and it’s a file on a computer, orit’s an API stream. So computational skills and good software engineering, or what you might callcomputational hygiene(卫生)计算用品, are going to be necessary. And it’s also, on the soft side, a very collaborative effort.(合作的效果) If you are applying machine learning to a different domain, what’s hidden inthe phrase applying is listening to somebody from the real world speaking andlearning their language so that you understand how to reframe their challengesas machine learning tasks. And then interpreting(解读) your machine learning in such a waythat you can communicate to somebody what you’ve learned by analyzing theirdata set. So there’s a soft set of skills, and the ability to listen,and the ability tocome to see how the problems of a domain can be reframed as machine learningtasks. And I think also some of those soft skills bleed over into the way you are as atechnical collaborator. So just collaboration skills with other people who are computational. There are the obvious skills. A data scientist needs be able to manage large amounts of data, they need to beable to program. So basically, they need to be able to handle lots of different programminglanguages, but more in a scripting sense. They’ve got to be able to be a data munger(管理员) and move data around from differentplatforms and different sources. 他们必须能够成为一名数据管理员,并从不同的平台和不同的来源移动数据。 They need to be able to program well. They need to process massive amounts of data. So in order to do that, you must program well. But very importantly, as well, I think they need to understand the techniquesbehind the data science algorithms. Many times these algorithms are used as a black box, and the user, while theycan code, while they may not really know what they’re using, what the algorithmactually does, what the strengths and weaknesses of the algorithms are. And when data and algorithms are used in that way,that’s often a recipe(菜谱、方法) for mistakes. So I think a good data scientist also needs to understand the strengths andweaknesses of their algorithms, and ought to be open to hidden biases(偏见) that might creep into(潜伏在) their analysis. 所以我认为一个好的数据科学家也需要了解他们算法的长处和短处,并且应该对隐藏在他们的分析中的偏见敞开大门。 Finally, curiosity好奇心. Curiosity is great in every field, and it’s very important in data science, aswell. I think its essential for data scientists to have a training in probability andstatistics. It’s also important for them to have a training in some of the computationaltechniques that are used in everyday data science applications, such as machinelearning. They also need to understand how to explore their data set before they analyzeit. So exploratory(探索)data analysis and visualization is also an essentialskill. And, finally, they need to have an application space. So they need to have an understanding of a domain such as health care, such asjournalism(新闻业), such as business, such as finance. They need to have some understanding in a domain so that they can actuallyapply the data science skills in a real-world situation and solve realproblems.


1.8 Lecture: What should a non-Data Scientist know aboutData Science?


Non data scientists need to know about the techniques andthe technology that is out there, even though they don’t need to programthemselves. 非数据科学家需要知道所用的技术,即使他们不需要自己编程。

They need to have some idea of the people who aredeveloping that technology, what they’re doing, and what are the limits on whatthey can offer,what are the general approaches.

Enough(足够) to be able to ask questions about whatother people are doing. And then I think they need to understand how data can be used, and inparticular,what impact it can have for a variety of different kinds ofapplications. 特别是可以对各种不同类型的应用程序产生什么影响。 I think non data scientists will come to appreciate (赏识、认识)the power of data again indecision-making(决策),in guiding practice(指导实践), and helping to assess qualitycontrol. So I think non data scientists will certainly see the value of having theirbusiness or their research impacted by data. And in my view, I think non data scientists should work hand-in-hand with datascientists, and, in fact, converge toward (汇合)a data science world in the sense(在某种意义上) that I think basic data scienceliteracy(文化、素养) for all will improve the interaction(交互) between non data scientists and datascientists. 在我看来,我认为非数据科学家应该与数据科学家携手合作,实际上,我们正在向数据科学世界汇聚,因为我认为基本的数据科学素养将会改善非数据科学数据科学家和数据科学家。 I really believe that we’re all generating that digital exhaust. Right? So we’re all partaking and participating in the digital world.(我们都是数据世界的参与者) And so whether you’re cognizant of that or not, you are part of the data science world. And I think what individuals need to understand is the impact that it’s goingto have in terms of what tools, and services, and what benefits data sciencecan have for you. So in the medical world, right now we’re all wearing these Fit Bits and we’remeasuring our exercise(运动) and things like that. But moving forward, we’re going to be able to have new kinds of sensors that,combined with machine learning kinds of techniques, we’re going to be able topredict our health. So we’re going to know tomorrow stay home because you’re going to come downwith the flu (感染流感)and don’t infect everybody around you. I think what people have to understand is that the train’s left the station(火车已经离开站,比喻数据科学正在发生). Data science is going to impact our lives, and hopefully it’s going to makethem better. Every grade-schooler learns to read and write,even though we don’t expect themto become poets or writers. Right? They’ll go off to do whatever profession they go off to(他们会尝试各种各样的职业),statisticians, what have you. We don’t expect them to be poets necessarily. 我们不指望他们必然是诗人 So I think by the same token, every grade-schooler, high-schooler needs to knowa bit about how computers work,code, a bit about how data are collected,howthey’re analyzed. There’s so much data being published now that there are some basic questionsthat people have to be able to ask. 现在发布的数据太多了,以至于有一些基本的问题需要人们去问。

Because I think, ultimately(最终), your question about the skill setsthat non data scientists have to have is a question about responsible(有责任的) citizenship(公民权),and what citizens in a democracy (民主国家)that rides on data in some way, that isregulating(调整) businesses that ride on data, that is managing itsresources based on decisions from data and algorithms, they need to be able toask questions. 因为我认为,最终,关于非数据科学家必须具备的技能设置的问题是关于负责任的公民的问题,以及民主中的哪些公民以某种方式使用数据,这些公民是依靠数据来规范企业的基于数据和算法的决策来管理其资源,他们需要能够提出问题。 So in the same way that 10 years ago I think we needed a spreading(普及) of computational literacy(计算素养), I think there will become a need fordata literacy. People will have to understand better how people are using algorithms that areinformed by data to make decisions about you,or making decisions around you, orsuggestions to you. And my hope is that people have a far better understanding of the fundamentalsof probability(概率的基本原理). And this has been true for centuries ever since, “lies, damn lies, andstatistics,” it’s been well-documented that it’s easy to confuse people withstatements about probability. 这种情况一直如此,人们已经很好地证明,很容易将人们与有关概率的说法混淆起来 And people, I think, would be well served by increased critical literacy arounddata. 而且,我认为,人们通过提高关键数据读写能力可以为他们服务 I think that they need to understand that field is growing, and that they’regoing to be a part of that revolution in one way or another. They are providing data to various institutions, and they should understand thepossibilities that exist with that data, and the value of that data. I think that individuals need to understand their rights with respect(关于、尊重、遵守) to their own data, and understand whatexactly it is that they’re giving away(数据泄露) when they give away their data. A manager running a business that may generate a lot of data needs tounderstand what sorts of questions the data can be used to answer, but alsowhat sorts of questions the data can’t be used to answer. And a very important issue as well is that between correlation and causation. 而且一个非常重要的问题是关联和因果关系。 You might see correlation in the data,but that doesn’t mean something causesanother. And that can be very important when the manager is trying to spend money toinfluence(影响) some of the outcomes it sees in its data set. This is also true in public policy(公共政策),by the way, when governments areallocating(分配、拨款) huge amounts of money to research in health and medicine(药物) and in other fields. It’s very important to use that money well. I think eventually everybody is going to have some form of digital literacy. It’s essential because everybody is generating the data that data scientiststoday is analyzing. So I think everybody needs to have some understanding of what the field of datascience is about. I think they need to have some literacy in the techniques,and they really needto understand what data scientists are doing with their data that they aregenerating themselves, and how that data’s being applied. I think it’s essential if you’re participating in today’s data-rich society tothat have a little bit of training in this area.

转载请注明原文地址: https://www.6miu.com/read-2600304.html

最新回复(0)