報 告 人:崔恒建 教授
報告題目:Model-free feature screening based on Hellinger distance for ultrahigh dimensional data
報告時間:2024年3月29日(周五)下午2:30
報告地點:騰訊會議 243-343-042
主辦單位:數學研究院、數學與統計學院、科學技術研究院
報告人簡介:
崔恒建,現為首都師范大學教授,博士生導師,中國科協第十屆全委會委員,曾任國務院學位委員會學科評議組專家。中國科學院系統科學研究所博士畢業。在大數據統計建模、高維統計及其穩健統計理論和方法、統計機器學習、金融統計、以及質量管理等領域取得過許多重要的研究成果,發表論文180余篇,其中包括發表在國際頂級的統計和計量經濟學雜志JASA、AoS、JRSS(B)、Biometrika和JoE上。主持國家自然科學基金重點項目、杰青(B)項目以及多項面上項目、主要參加教育部重大科研基金項目、科技部863等項目。現擔任《數學學報》和《應用數學學報》中、英文版以及《Statistical Theory and Related Fields》編委,中國現場統計研究會副理事長,全國工業統計教育研究會副理事長,北京應用統計學會會長,國際數理統計學會(中國分會)常務理事。曾獲得教育部高等學校科學技術獎-自然科學獎二等獎;全國統計科學研究優秀成果獎一等獎等。
報告摘要:
With the explosive development of data acquisition and processing technology, the dimension of features increases exponentially with the sample size, which poses great challenges for data analysis. It is vital to accurately identify useful features from thousands of them. In this paper, we develop an omnibus model-free feature screening procedure based on the Hellinger distance with some appealing merits. First, we define the Hellinger distance index for discrete response variables in discriminant analysis. Second, this procedure works consistently for continuous response variables, in which the continuous responses are discretized by slice-and-fused technique. Third, it is robust to the potential outliers and model misspecification. Theoretically, the procedure for discrete and continuous response variables possess sure screening properties and ranking consistency properties under mild conditions. Numerical studies demonstrate that this procedure exhibits strong competitiveness in heavy-tailed and skewed data, while remaining comparable to existing approaches for light-tailed data, indicating its robustness performance across a range of data. Real data contains two examples, discrete and continuous response variables, to illustrate the effectiveness of the proposed method.