论文标题
GLM用于通过生物安全案例研究的部分合并的分类预测指标
GLM for partially pooled categorical predictors with a case study in biosecurity
论文作者
论文摘要
国家政府使用边境信息有效管理旅行和商业带来的生物安全风险。在澳大利亚边境生物安全系统中,有关货物托运的数据是从方向记录中收集的:也就是说,生物安全监管机构采取的行动记录。记录给定条目的方向的方式使此数据收集变得复杂。条目是导入线的集合,其中每行是单一类型的项目或商品。当数据记录在行模式下时,分析很简单:为每条线单独记录了方向。当数据以容器模式记录时,挑战是出于挑战,因为条目中的每一行都记录了相同的方向。换句话说,如果条目中的至少一行具有不合格的检查结果,则该条目中的所有行被记录为不合规的。因此,容器模式数据为估计某些项目不合规的概率带来了挑战,因为不可能将不合规的记录与行信息匹配。我们开发了一个统计模型,以使用容器模式数据来帮助信息的生物安全风险。我们使用渐近分析与线模式数据相比,估算容器模式数据的值,进行仿真研究以验证我们可以在大数据集中准确估算参数,并且我们将方法应用于真实数据集,其中使用新模型恢复了有关非固定风险的重要信息。
National governments use border information to efficiently manage the biosecurity risk presented by travel and commerce. In the Australian border biosecurity system, data about cargo consignments are collected from records of directions: that is, the records of actions taken by the biosecurity regulator. This data collection is complicated by the way directions for a given entry are recorded. An entry is a collection of import lines where each line is a single type of item or commodity. Analysis is simple when the data are recorded in line mode: the directions are recorded individually for each line. The challenge comes when data are recorded in container mode, because the same direction is recorded against each line in the entry. In other words, if at least one line in an entry has a non-compliant inspection result, then all lines in that entry are recorded as non-compliant. Therefore, container mode data creates a challenge for estimating the probability that certain items are non-compliant, because matching the records of non-compliance to the line information is impossible. We develop a statistical model to use container mode data to help inform biosecurity risk of items. We use asymptotic analysis to estimate the value of container mode data compared to line mode data, do a simulation study to verify that we can accurately estimate parameters in a large dataset, and we apply our methods to a real dataset, for which important information about the risk of non-compliance is recovered using the new model.