[Musical Cue][VOICEOVER]
National Data Archive on Child Abuse and Neglect.

[Clayton Covington]
all right it is now 12:00 pm eastern time so welcome everyone to the 2024 NDACAN summer training series. For those of you who are new to NDACAN summmer training Series this is a annual series that we host in order to take a deeper dive into various data Holdings and also emphasize best practices for the use of NDACAN data. In fact this year's theme is the best practices in the use of NDA can data and this series is going to be hosted every Wednesday starting today until August 14th from 12:00 P p.m. Eastern time to 1 p.m. Eastern time. We have a variety of presentations but I'm now gonna hand it over to Dr Cara Kelly.

[Cara Kelly]
hello everyone I'm Cara Kelly from the Children's Bureau and I am the COR for the NCANDS data collection effort at CB as as well as a COR for the data archive and NCANDS is an really an important data collection effort for us at the Children's Bureau as it originated with the 1988 amendments in the Child Abuse Prevention and treatment act commonly known as CAPTA in the field. Since that time we've had a number of subsequent amendments to CAPTA that have led to several new data collection elements that are included in the data collection effort but it's important to note and I think a lot of folks in the research Community don't always necessarily know that while there are some required data elements that are collected in CAPTA NCANDS remains a voluntary data collection effort by the states. And our success in gathering the data in NCANDS really is largely due in part to a strong state and federal partnership and the incredibly hard work of our contractors over at Walter R McDonald and Associates known as wrma who do most of the work for on behalf of the Children's Bureau for NCANDS. And so I am here today because I really just wanted to take a moment to let you all know how excited I am to see such a great turnout for this session of the summer webinar series as we really value the use of our NCANDS data in the community by researchers in answering important questions of interest. And I also wanted to take a moment to thank the work of the staff of the archive for always putting on such a wonderful summer webinar series that fills such an important Gap in the field in supporting the use of NCANDS data by researchers in the community. Sarah here really is a fantastic expert in this data set so I'm hopeful everyone today is able to learn some important information today and that you all will continue to use the archive as an important resource in your future work. And I'll turn it back over to Sarah or to Andres.

[Sarah Sernaker]
I'll take over unless Andres do you have anything to add before we start? 

[Andres Arroyo]
No.

[Sarah Sernaker]
okay I think I'll add something towards Andres so if you're having any technical difficulties or anything like that accessibility Andres would be your contact and I think his email was provided in the or his email address was provided in the email you should have gotten when you registered for today. But that aside 

[Clayton Covington]
Sarah I'll add one one thing I apologize I forgot to mention at this beginning. So the way that this is also going to be structured is I know you all will likely have questions as the pre presentation goes on. In order to facilitate a swift Q&A we're going to hold all questions to the end and so you can ask your questions by specifically using the Q&A box on your screen via Zoom. 

[Sarah Sernaker]
Yes thank you Clayton. And I don't see the chat come in so yes Clayton manages all that very nicely. So let me just dive in. So like I said this is our first series of our first presentation of the summer. So today we're going to be talking about NCANDS and particularly the strengths and limitations of using it. So this is not just for new users this is also for experienced NCANDS users and just things to think about while you use NCANDS to get the most of it but to also understand the limitations that you know do exist and can't really get around besides addressing them and doing the best we can. And so we'll talk about that. And so today we're talking about NCANDS next week we'll be talking about reporting issues in NCANDS and afcars. I'm trying to think who's doing that it's either I think Alex Roehrkasse. Then the next week we'll be kind of doing an analogous talk but for AFCARS the strengths and limitations so definitely join us then for a similar sort of discussion. Then we'll be having a presentation about survey design and using weightS so that'll be just a broad picture of survey analysisS August 7th we're doing a presentation on NSCAW 3. And then August 14th our last one will be a sort of similar present for our NYTD data set. So all of our admin data sets. So today I'm going to talk about a little of the NCANDS background just to give us all a basic understanding of what NCANDS is how it's collected the two different files that are sort of encompassed in the NCANDS name. And then the strengths and the limitations when you use this data set. I will say before we dive in the best resources whenever you're using this data are our codebooks and users guides and those are available from our website NDACAN I don't know the full website maybe Andres or Clayton can put it in. If you haven't seen our list our data website that's where all of our documentation is and that really holds a lot of key information when you're utilizing this data. I have it open every time I use it and I've been using this data for years now. So I highly recommend anytime you use this to just open up your documentation to refer to. So the background as Cara very greatly put it the national child abuse and neglect data system was designed in 1988 in response to the CAPTA act and this required a national data collection and analysis program on child Maltreatment. And NCANDS is the primary source of this national information on abused and neglected children reported to state child protective services. There's really no comparable data to the NCANDS and so it really is the state-of-the-art data regarding child abuse and neglect. And the data itself are funded by Children's Bureau which lives in the Administration on Children Youth and Families with as part of the Administration for Children and Families all within the US Department of Health and Human Services. Just to give a sense of who's involved. So as I've mentioned NCANDS is a federally sponsored annual national data collection and it tracks the volume and nature so the number of maltreatments and the nature of maltreatment reporting. So States, as Cara had said, this is in response to a an act, the CAPTA act, but this is still a voluntarily this is sorry. This is a voluntarily voluntary submission and so states don't necessarily have to submit data but the way sort of the data collection process has worked is nowadays funding or other things are tied to data submissions and so there's really there's a lot of incentive to provide data not to mention it's just become the standard and all states nowadays do submit data with the few minor exceptions. So it is really the standard to provide data every year. And submitted data so data that ends up to us and that we distribute to the public consists of all investigations or assessments of child maltreatment and so that could be substantiated or unsubstantiated claims. But any report that received an investigation is what NCANDS has information about. And so yeah any report that receives a disposition. So a disposition is just the sort of finding of the investigation and so a disposition is either substantiated or unsubstantiated to put it simply. And the findings from NCANDS data are published by the Children's Bureau each year in the Child Maltreatment report series. And I wanted to mention that there because they do a great job of yearly reports. So like summary-level statistics and you know number of maltreatments, number of children experiencing maltreatments, and perpetrator information. And so that's a really great place to start also if you're not familiar with NCANDS to get a sense of the sort of differences at state-level and you know high-level differences. And that's available through the Children's Bureau website. So the way that data are collected are that each state collects data using different concepts and definitions based on I mean differing state policies and laws. And this affects the variables that are collected, variable names, valid values, and so think of each state as its own sort of data collection entity because that's really what's happening. And these data collection efforts are dictated by state laws and we could even talk about you know County and agency level differences but on the highest level there's just differences in state laws and the way they Define things and the way their laws are written. And so this inevitably affects the way that data are collected and you know look like in the end. And so when States collect the data they organize it all into an annual system, they send it to the Children's Bureau who does quality standards. So we you know Children's Bureau ensures that there's conformity you know even though there are differing State definitions we do try to CB does try to conform all of the you know variable values and definitions as much as possible. And it's an iterative process where Children's Bureau will be in talks with States until a state can meet the standards for NCANDS data. So NCANDS definitely goes through a lot of  iterative cleaning to make sure it is up to standard and of the highest quality that we could make it. With CB mostly not say I don't know why I say we because then CB sends it to us and so Children's Bureau does their set of cleaning they send it to NDACAN we do further cleaning and a few variable creation and then it's released to the public. I meant I forgot to mention suppression so there's suppression that's done all of these things the sort of basic data cleaning that you could think of.

Yes let me get into it so NCANDS really contains two sets of files, what's called the child file and what's called the agency file. And the child file is kind of synonimous with NCANDS. Most of the time when people are talking about NCANDS they mean the child file and so that's what's used the most in research settings. The and that is the individual level data of each child by report. So a child could have multiple reports in a year and on a report there could be multiple children. So a Year's worth of NCANDS data is unique by child and Report. And so the child file contains the rows of individual-level investigations of each child and there's a whole host of information which we'll get into about the maltreatment itself, about the child, about the perpetrator, services and Etc.. So the child file is really is like the meat of the information of NCANDS and where you get the really granular details about the investigative reports. And the child file goes back to 2000. We'll get into its limitations one of the limitations that I'll just mention here is the early years of the child file where data were just starting to be collected and organized and sort of conformed and cleaned and whatnot not all states submitted. So like we've been mentioning this was a volun this is a voluntary system and so there were just a lot of States who did not submit data in the early years. And there was just you know problems and a lot of differences and some of the data that were reported in the early years. But that's topic for later slides. So that's the child file like I said when people mention NCANDS they're almost always talking about the child file but there also exists what's called the agency file and the agency file exists from 2009 to the most recent to 2022. But there was what existed it was just under a different name called the combined aggregate file so there is a sort of analogous agency file for earlier years. But in any case this is state-level data so if you ever ordered the agency file there's just one row per state per year and it's just aggregate information. It's information that has been requested by CAPTA legislation but is not able to be collected on the case level for whatever reason. And so things that are in there are I'm trying to remember now and I think my next slide has it so I'll wait for that. Yeah so this is all to say state-level data this is aggregate by state and so this can be helpful because some of the pieces some of the variables in the agency file aren't collected in the child file but again you don't have such granularity and you can't really tease apart aggregates by race or sex for example. So that's a sort of limitation on using that. So I'm just going to talk a little more about the agency file just to kind of get it out of the way and then we'll shift gear to the child file because like I said that's the most often used and I think where the bulk of research stems from. But the agency file like I said a records provided at the state level each year and all the data submissions are organized by federal fiscal  year and that's true of the child file and what that means is data are you know the year that defines data and data submissions or data files that you request from us is they run on a fiscal year. And what that means is October 1st of the fiscal year to September 30th of the next year. It's a little confusing but so when we talk about fiscal year 2022 what we mean is October 1st 2021 to September 30th of 2022. So yeah it just something to be aware of and then I'll mention again later. It doesn't usually cause an issue but it's just again something to be aware of. And this is I've been saying at the state level when we talk about States and again also for the child file we mean the 50 states plus DC and Puerto Rico. So whenever I mentioned States I'm talking about 52 entities. So as I said this is supplemental to the child file. It consists of measures not found in the child file. For instance the number of children or family receiving Services, number of Staff who screen in or investigate reports, a full count of the number of fatalities. I mentioned that there are measures of child deaths through maltreatment in the child file but not in all states do they capture all of the fatalities in the child file. And so you know especially if you're looking at child fatalities the agency file might be a good supplement to get a Full Count. But yeah so the agency file as I mentioned supplemental to the child file it's all aggregate at the state level, you can't really tease anything out beyond the measures that are given directly. And if you're interested in more again I direct you to the codebooks and the users guide which defines all of the variables that you can find in that. And then we jump to this the child file the meat of the information. And records as I mentioned are provided at the level of each child on a report. So as I mentioned and again I just reiterating because it's hard to wrap your around a child could show up multiple times in a year if they experience multiple reports of maltreatment and it on single report you could have multiple children. And so that's why we say they report child pair uniquely defines an observation. So if you think about a year's worth of data, one row is going to pertain to a child on a certain report. So it's just important to keep in mind when we talk later about identifying you know like who you want to count by. If you want to count reports or children uniquely and all of those things. As I mentioned data submissions are organized at the fiscal year the federal fiscal year. States include all states, DC, Puerto Rico. There is a small exception which Michael Dineen who is our NCANDS expert 2000 to 2002 are actually organized by calendar year which is something I feel like I've overlooked in the past. Very important to keep in mind because if you're using those early years there would be some overlap in the 2002 calendar year and the 2003 Federal fiscal year because remember the 2003 Federal fiscal year will include a few months from 20 or sorry 2002. So I just wanted to make that important exception. Otherwise all of the NCANDS data are organized by federal fiscal year. And I say that and data files are organized by submission year so what which relates to the date of the disposition of the report not the date of the incident or the date the incident was reported. So what is this all mean? So when I say that states collect data and then submit it and we release data per year so think of one data file relating to one year of the child file, what defines what children you know are organized in this year is the date of the disposition of the report. So it's a little bit of a disconnect. So an incident happens an incident of maltreatment and someone reports an incident whoever it is, and the incident is investigated and ultimately the incident is either substantiated or unsubstantiated. There was maltreatment found or there was not. And so that date of the disposition when it was substantiated or not substantiated that's what defines what year you know this child report ends up in. And so I'm kind of drawing this out to say that sometimes you'll see an incident date sort of precedes the year in which you find their data because the disposition date can come much later usually it's you know a month or two months but sometimes it's a whole year. And so again I'm going to talk more about this in a few slides but it's just something to be aware of because it may not be exactly what you intuitively think going into this and it's just something to be very much aware of and we work with researchers who don't want to work with the disposition date they want to work with the incident date. And that's great and you just have to sort of reorganize the data and usually link a few years again to capture these incidents that maybe were disposed at a much later time. Again this is all very a lot to talk about and I'll bring this up again when we get into the data stuff. But again something to be aware. Of and the child file is where we have all of this host of information at the child level. It includes demographics of children and their perpetrators. And I will say that is in substantiated cases, there is not information on perpetrators or alleged perpetrators I should say in unsubstantiated cases. So yeah that would be missing if the maltreatment was unsubstantiated. There is the types of maltreatment investigation disposition, there's risk factors such as you know alcohol or drug abuse of the parent or even the child things like that ,and services provided as a result of the investigation. So this could include foster care services, I think like Title IV payments I think we have I get them mixed up that. Anyway a whole host of information relating to the child, the perpetrator, the maltreatment and report itself and a little bit of you know what happened next.

And so let's talk about disposition more because that is what defines or that's kind of what where that's how the records are organized I should say. And so all NCANDS reports are disposed it's either substantiated,  unsubstantiated, which I've mentioned or what's called an alternative response. And so I just wanted to talk about each of those. So substantiated or indicated reason to to suspect this is where maltreatment was confirmed. A child may or may not be removed from the home and a child and family may or may not receive Services due to this. An alternative response is sort of in between substantiated and unsubstantiated and kind of why I hadn't spoken about it before because it's a little more nuanced and I think State reporting is very different in regards to alternative response. I know some states like have a lot more alternative response cases in some states I don't think have any. But what is alternative response? This is where a family and their children are at risk of maltreatment there was an investigation and maybe there was no maltreatment confirmed but you know the situation wasn't great. And so this is cases where family or children are identified at risk and would benefit from Support Services to avoid family separation and to try to prevent any future maltreatment. And so this information is in the child file because this would be cases where they're receiving you know state-sponsored services that are measured through NCANDS and so they are this information is also captured. And yeah sometimes if you're looking at you know categorizing substantiated maltreatment versus unsubstantiated I've seen cases where alternative response is grouped into to substantiated or indicated reason to suspect. Yeah and then the unsubstantiated cases are the investigation yield no confirmed determination of maltreatment. There was an investigation and just no confirmed maltreatment was found. So in the child file as I've mentioned numerous times the unit of observation and basically what uniquely defines a row is the report-child combination also called the RC pair. We refer to it I think some places in the documentation. And so these are defined you know explicitly in the data like with the variables using Child ID so ChID and Report ID. And so a Child ID may appear on more than one record because the child could be included on more than one report and a report identifier may repeat because there will be a separate record for each child on the report. But no two records will have the same report ID Child ID pair within the same submission year because that would be just a duplicate case and there should not be any duplicates in the child file. So yeah I think I've reiterated that numerous times so with the baseline understanding of NCANDS let's talk about the strengths of using it and just the strengths of the data set itself. As I briefly mentioned this is just frankly the best available data on the subject. There's really no comparable source to what NCANDS provides unless you're a researcher you know literally working directly with the state which has you know its own I'm sure barriers and Hoops. But NCANDS is the primary source of national information on abused and neglected children reported to State Child Protective Service Agencies. Every investigation comes through NCANDS and so we just have the data on all of the reports. There's in addition to just being the best available data on the subject through NDACAN there's also I think the best support and resources. These are people you know like Michael Dineen who I think is listening and has been working with this working with this data for about 20 years now and so and I've been working with it for a few years and continue to work with it. And so we work with researchers, we answer questions directly, we've you know created a lot of resources to help us internally which you know therefore helps users and we can share with. Yeah we provide a lot of hands-on support to help users use NCANDS and understand it and get the best you know info they can can out of it. And not only NDACAN, the Children's Bureau as I've mentioned they have their yearly Child Maltreatment Report which might be sufficient for some of the research out there. Like I said it has high-level counts by State, various you know high level reports and tables that could be taken right from there. So yeah there's just a lot of resources and data support. This is a data set that's been in existence for over 20 years and so yeah there's just a lot of people have used it and a lot of support and a lot of things you know uncovered over the years. A huge benefit of NCANDS which to remind you is about child abuse and neglect is that it is linkable with our AFCARS data set. So as I mentioned we'll be doing a talk later this summer about AFCARS but broadly speaking AFCARS relates to foster care cases and adoption. And so it's really worthwhile in some research to link NCANDS which has child maltreatment and neglect linked with foster care data which is kind of you know you can think of as like different stages through the child welfare system and sort of like linking those two pieces. Another strength as I mentioned this data have been in existence for over 20 years so you can do long-term tracking of changes over time whether that's difference in state policies or statutes or observed impact of historical events such as covid-19. And you can directly observe the impact utilizing the data kind of making figures and whatnot which I've done and we'll show you in a few slides. And you can compare information between states. And I put in parentheses with some caveats because it's sort of like a double-edged sword. You can compare information between states we have this national data collection and so you can in some cases compare apples to apples but I did put in parenthesis with some caveats because comparing some states is like comparing apples and oranges. And so while there is a lot of utility and you know a basic understanding to compare States it's definitely something I say with caution. And we'll talk about as part of one of our limitations actually. And not only at the state level if you obtain the child level obtain the child file data there's also county-level information. I will say there is a caveat to that that is one of the variables we do apply suppression to. There's only a few data entries that are suppressed due to data privacy and security and just the sensitive nature of this data and the population we're measuring. So there are bits of suppression that go into our data mostly date masking and whatnot but anyway one of those is County suppression. And so we suppress counties that do not have so right now we suppress counties that do not have more than or sorry who have less than a thousand cases so child report cases. Any County who has less than a thousand cases would be suppressed. So we mask the county level information we do sort of just group it into an "other" County so the state is still identifiable we leave the observation there but the exact county is not identifiable. And so that's our current threshold 1,000. I'll state here we're actually undergoing some data cleaning we are lowering the threshold which should make which will make more counties available. So we're going from 1,000 down to 700. We're currently working on that and that will be released later this summer and announcements will go out but so that's just something to look forward to we're making even more data available. But I will say this is it's part of our strength but it's a little prohibitive for Rural counties because those are almost always masked just due to the you know the small population size. But moving to our next strengths. It's not just information about children. As I've mentioned it's not just demographics or information related to the maltreatment but we also have information about the perpetrator. Basic demographics like the age and the race, risk factors of the perpetrator. And I should mention a report can have up to three perpetrators listed. So yeah there could be three perpetrators identified on a report. And on a report there is explicit sort of mapping where you could say you know this perpetrator confirmed with confirmation did this maltreatment and this perpetrator with confirmation did this maltreatment and so there are these variables to map you know which perpetrator was responsible for which abuse. Yeah untangling all the variables together. There's also information about services received after an investigation. So as I mentioned foster care services whether that's out-of-home care or sometimes foster care like just other services they received like monetary I think yeah. There's also information about risk factors for the child and caretakers. So as I mentioned like substance abuse or disability of the child or the caretaker. So again sort of a rich picture and each observation is densely packed with information. Almost like not too much information there's a lot of information and when you get the child file you know untangling all of it definitely takes some time thought. And so you know definitely reach out if you're trying to make sense of it and need help. But those I think were the main strengths. I wanted to highlight one of the strengths I had mentioned and that was tracking data over time. So this is our brief pretty picture. You know pause of information. This I put together I wanted to show the utility of all of the data we have over time on the left panel we have national rate of maltreatment investigations. So this is all investigations of maltreatment whether substantiated or unsubstantiated. And the point I wanted to show in this figure was the covid-19 impact. And so you can see we have 2018 we have 2019 and then 2020 where covid lockdown really hit we can actually see this marked decline of rate of maltreatment investigations. And I don't know if we all want to believe that maltreatment just went down during covid or it's just a function of reports of maltreatment and who is reporting maltreatment. And so what I've done on the right panel here is to break down over time the proportion of the report sources. So each report that comes in it comes from a report source and this information is captured for example it could be medical personnel, Social Services, law enforcement, parents, other. But my main you know goal in showing this was to show the education professional and how that took a severe dip during 2020 because we were all on lockdown and inside and you know children who might be maltreated at home and who only would have been you know observed through their teachers didn't have that chance or it just wasn't happening. And so I think this is just really kind of interesting visual to show like you can see these changes in impacts and this probably looks really interesting at a state level but I didn't want to go down the rabbit hole and take up all our time. But this is just you know this is the wealth of NCANDS that we can see you know these changes and effects of you know such impacts. Okay so after all the strengths and whatnot we have to talk about limitations. For all of the you know good stuff we can get out of NCANDS which is a lot there are serious limitations that need to be considered when you're dealing with data. And for the most part it's no different then I mean every data set has its problems and you just kind of have to address them and be aware and overcome them. And so it's nothing any more substantial than that. But just to highlight some of the things you might come across. I put this as questionable observations. These are more like data nuances I would say. And this is just two examples that to highlight what I mean. The first is you might find a Child ID with an unlikely number of reports within a submission year. We've seen cases where a child could apparently have you know up to 10 reports in a year and so that it comes back to you know do we believe that this was 10 reports in a year? Do we think that it's a problem with dates? Or it's most often a problem with IDs child IDs which I'll talk about more. Maybe briefly let me mention a Child ID so when a child gets a report and they're going into the NCANDS system the state assigns them an identifier and that identifier is consistent for the most part which I'll get into consistent throughout the year. So if a child enters the child welfare system in 2007 we should still be able to track them if they enter again in 2010 because they should have the same ID. And a state assigns the ID and they do they assign an ID and then they do what's called an encryption. So they encrypt the ID so that it's really untraceable. Like by the time NCANDS gets the data or even Children's Bureau like there's just no identifiers which is how we want it. We don't want any primary identifiers so there's no names, no social securities, the only way to identify a child is through this Child ID which the state is creating and then encrypting okay. And so usually when you see cases where you're seeing multiple child IDs or you know you see the same Child ID but like different demographics change like a different birth date or a different sex or race I don't know. Sometimes just a matter of like an encryption problem. Some states are particularly problematic for example Puerto Rico we've just noticed seems to just not have a great method for assuring that an ID has not already been used. And so yeah this is just to say these nuances and sort of weirdness occurs and I think just a matter of staying alert. And at that point you know if you're finding such nuance cases sometimes you just have to make a rule and say okay I'm going to collapse this information or I'm going to take what I think to be the best truth based on you know some proxy or you know level of data that was captured or whatnot. Similarly you could observe that there are multiple reports on the same day. We've seen cases where it just appears perhaps that a maltreatment was maybe reported multiple times by different people and so you just might see multiple reports on the same day. And those are most often in the case is just duplicates that haven't been deleted. And I will say I'm mentioning this we have a whole slide here but this is really low prevalence. It's again something to be aware of if you're doing counting and you know your research question is really in the weeds it's something to check, to be aware of, to understand the extent and how it could affect your data. But again these are really like low prevalence. These are not happening with high you know with high frequency.

So the variation Over States and time I briefly mentioned comparing some states is like comparing apples and oranges. Sometimes within a state comparing between years is a problem. So as I mentioned when a child gets an ID the state creates an ID and it's then encrypted. And so usually throughout the years the child maintains the same ID but some years some states change their encryption algorithm. This is you know totally independent of CB and NDACAN and this is just you know State functioning. Whatever reason they changed their encryption algorithm and that actually breaks the ability to track a child over time. And so they're what we've called the breakage and linkage years and I have a table on the next slide that kind of speaks to this problem. But basically if you were trying to track a child over multiple years and you experience a state with this breakage you will just not be able to track the same child from one year to the next because literally their ID is changing. It's again not super prevalent especially in the later years. We do keep an internal table tracking this if you're super concerned send us a request to our email and we can make the material available. I'll reiterate this on the next slide but it's very much something to be aware of. So not just between years but between states the reporting states were different in the early years I mean different as in not all states reported in early years and it's kind of random. Some states report and they don't report and they report again. But again it's mostly like 2000 to 2005, 2008 things very much settled down and I think basically all states are reporting by 2011. And again I have a table to show that. A huge thing to keep in mind when you're comparing States is just the differences in state definitions. I talked about this briefly but it's something like I emphasize to anyone using the data like whatever your research project is, your research goal I would look at all of your variables or your response variable within each state to really understand you know how homogeneous or heterogeneous rather the observations are. I would also recommend confirm with the child maltreatment report in the child maltreatment report in the appendices each state has their own page. Sometimes you can get little Snippets of you know our state does not collect this information or our state doesn't include this setting you know in our neglect definition or we don't capture this blah blah blah. And so I won't say everything's in there but I definitely like to refer to the child maltreatment report appendices just to get a sense. We also have a data set called SCAN this is available through NDACAN and also Mathematica. This is kind of our sort of weird dual release data set. But SCAN is super great it's from 2019 to 2021 and this data set is basically metadata on like what the state statutes include. So like who's included in mandated reporters, does neglect definition include this that and the other, does the sexual abuse definition include this that and the so it's a way to get a sense of what is being collected like when we say oh this child is indicated for neglect in this state. Is that exactly the same definition of neglect as it's as a state next to it? And that's the sort of context that SCAN can provide. I tell people the only downside to SCAN is that there's not more years of data for it. There is more forthcoming but I don't think there will be any going back to before 2019. There's also you could talk to your state directly if you feel so emboldened. In the child maltreatment report appendices there are State contacts and so you know we've worked with researchers who are specifically looking at certain States and I tell them if you want to contact them directly if your whole research project is one State it might be worthwhile to talk to them and really get an understanding of how the data are collected and like what differences might arise between counties for example. Some other examples I'll put in just differences in not just State definitions but State reporting for whatever reason. For example Maryland has poor reporting of report Source. Like you'll see if you compare just frequencies of the report Source levels in Maryland to other states they just don't report like information about a lot of report sources it's like "unknown" or "other" for a lot. No perp information is found from Georgia for whatever reason probably relates to a statute. And then for example there is no race information in Pennsylvania for many years. So that again I don't know if that's a state definition or prohibition or whatnot or just a data collection problem. But again there's just states are each their own data collection entities and we're kind of just trying to fit all the pieces together to construct NCANDS but again like I highly recommend looking at like you have if you've identified a main response from NCANDS look at see how that looks with by state, by year by state, because any differences you might see might not be you know causal or because of the explanatory variables you've included it might simply just be because the data are different and the data collection are different. And you know we can only do so much and so yeah I again it's a lot to keep in mind. This is a limitation but again I don't think it's prohibitive I just again there are differences in state definitions and they are can be significant and if you're basing all of your research and you know your thesis on this it's really important to understand the underlying differences that could thus be affecting like what you're observing in the data. So that's my that's my Spiel. And then this last point I think just reiterates Again State-to-State variation in child maltreatment laws and information systems may affect the interpretation of the data. Again refer to State mapping documents included in the data. This so this actually are documents that we include in the data package that I actually always forget about too. So it's just another piece of information to try to help you understand how you know data in this state might be collected different from data in this other state. But let me stop going on because I have a table to put up.

[Clayton Covington]
And Sarah just a warning we need like five to seven more minutes before.
 
[Sarah Sernaker]
I got the time in front of me luckily. I tend to ramble on. But okay thank you Clayton. So just really quickly this is just to demonstrate visually the breakage and linkage so this is specific to the linkage problem I was outlining in the child IDs and the change in encryption. So anywhere this is just a subset of the states obviously I couldn't fit the whole thing. It goes from 2000 to 2021 and each column is measuring whether you can link the two years together. Gray means that the state did not submit data so for example Alabama didn't submit data until 2005. A one indicates that linkage is possible there is no change in encryption. A zero indicates breakage in linkage. So this would mean if you were using data from Alabama from 2005 to 2021 you would not be able to track the same child from 2005 all the way to 2021 using the Child ID. And that again is at the state level out of our control. I don't know why they do it. I don't know. I can't speak to that. But again this is something that is really useful especially if you're talking about longitudinal analysis. And understanding you know it's not that you're just not observing the child you just you can't observe you can't identify them with the Child ID anymore that's is the breakage and linkage that we're talking about. This table we can make available by request. So another helpful table just to give a sense of the state submitting by year. Just really briefly we start at the earliest 2000. So these are the states who are submitting. And I've just listed whichever list was not as long. So there are less States submitting than not submitting so we have 20 States who submitted in 2000 it goes up to 24 in 2001 but then you see 2002 there's 42 States now submitting there's still 10 states that did not submit. And like I said it just increases over time and by 2010, 2011 Oregon is our hold out. But then by 2011 and 2012 all states are reporting. And then we just have sort of anomalies so for example Puerto Rico didn't submit in 2016. Arizona didn't submit in 2021. And we do get resubmissions each year so what when we get new data submissions for like the new year of data, sometimes we get updates for previous years. So for example Arizona might be in the realm that eventually we'll get the data. It doesn't seem likely we would get Puerto Rico you know that's pretty far back. But anyway this is the state of things. So like I said all states are reporting as of now and like I said there's a lot of incentive to report. Just quickly I've touched on suppressed and missing data some masked variables that we have is the county masking so that's a big one in identifying rural populations. So under our current threshold for example in 2021 there are 740 identifiable counties. And so to give you a sense I think there's about 3,200 counties total in the US or something like that about 3,000 something like that. So just to give you a sense of the counties that are identifiable. We do mask dates. And so for example the report date is rounded to the 8th or the 23rd it's basically whichever date it's closest to rounding. And then we shift all the other dates based on the date masking. So we shift the report date and depending on you know maybe we move the report date two days forward then all the other dates in the data are moved two days forward to maintain the times span between them. Something else date of births are omitted, we provide the child age though. Other suppressed data is if a child experienced death due to maltreatment we suppress a lot of information. And again that's due to the sensitive nature of these populations and you know all of that. There's no Geographic info including state so if it's a maltreatment death even the state is not identifiable. Child and perp IDs are suppressed so you couldn't link to previous records yeah. Another more nuanced suppression in our round of suppression at NDACAN if there is only one individual of a certain race within a county we suppress the race because it seems again due to disclosure risk. And that's our suppression. So that's our suppression chunk and then there's missing data which again is either a function of data reporting or just lack of data collection. So example again Pennsylvania didn't reports for a while for whatever reason. That's just missing in the data set. And there's just missingness across all variables again very standard in any sort of data collection. And then I just included there's a few variables not included in the public used files: date of birth, County of residence, worker ID, supervisor ID, and the incident date itself. Yeah we have the report date not the incident date. Other stuff to keep in mind if you link the data with AFCARS the dates sometimes don't exactly line up. Also not all states provide AFCARS's ID for linking. So some states are just you can't link with AFCARS. Some variables are more reliable than others. I always recommend doing a sort of check on the variables like I said if you're basing your whole thesis off your response variable just make sure you understand all the nuances that go with it. There are some information that could be more granular we don't have you know specificity of the injury, whether there was hospitalization, you know the severity. We don't have broader identification of you know family structures like sibling identification and things like that. Similarly these are imperfect measures. The example I came up with is we have variables regarding disability and so this is if a child is officially diagnosed but you know given these populations it might be prohibitive to see a doctor and get an official diagnosis even though you know they're living with the reality of a disability. So just sort of like nuanced sort of imperfect measures like that. When linking across years you might observe demographic information can change, sex, race, County and ultimately you just need to make a decision on how to use it. I do see I'm at time Clayton let me just spot through the last of this. And this is Administrative data. Just taking a step back administrative data is collected as a data collection system and it's not created with the like scientific research design in mind. So it doesn't conform to rigorous Criterion in that regards and we do end up with these sort of nuances and state differences and that's just sort of a function of working with administrative data and it's not particular to NCANDS itself. The other thing really quickly is AFCARS's ID has is like a weird character I just mentioned that briefly if you use certain programming languages you might see like characters you're not used to seeing and okay just can cause cause weirdness sometimes. I provided this table to show which states you can which states you can't link with AFCARS. So for example Pennsylvania, Illinois, Vermont is pretty problematic. Proportion of missing race ethnicity really quick you can see Pennsylvania's 100% all the way up until about 2014. So this is proportion of missing race. Any the red line is 50% so anything even approaching that probably not great. Wyoming yeah. Yeah other considerations Covid affected data, data are aligned with disposition date, may want to organize. Data are also organized by fiscal year you can realign to calendar year. Differences in coding such as missing values could be different depending on the variable. Just always have your code book up that's what I say. Seek assistance. I'll just end there let me just stop talking I'll leave this up. But we are always here to help.

[Clayton Covington]
All right thank you Sarah. So all of our attendees we unfortunately are not going to be able to get to all of your questions but we'll answer as many of them as we can as quickly as possible. So the first question from Caroline asks how are demographic data for example race and ethnicity and risk factors such as alcohol abuse of child and alleged perpetrators recorded? For example are measures self-reported, by interview by a social worker? And are there metadata provided about data collection provided in the child file so a researcher could account for variability between the self-report versus a social worker report across States?

[Sarah Sernaker]
That's a great question and I'll say there's no information track it and it's very much a question on my end as well and it's kind of something I assume differs probably agency to agency, case to case. That's really just a function of you know how an agency operates or is someone going out do they have the time to collect all the pieces of information or is something like race they observe and go back to their computer later and write down after the fact? So unfortunately I don't know and it's something to you know be aware of in the data. I will say something that speaks to that is for example missing patterns in some variables and years and States I've seen you know all variables are either one or missing and to me that seems like an input data situation where someone's indicating ones and nothing else and so I don't in those ways you can kind of get a glimpse into how like the data collection might have happened. But other otherwise we really don't know. And it's something to know be aware of that that could be creating real differences in the data.

[Clayton Covington]
One quick followup to that is to confirm report ID is the unique identifier not Child ID.

[Sarah Sernaker]
Well report ID identifies a report uniquely but multiple children could show up on that report and children on that report could show up on other reports. And so really report ID and Child ID uniquely Define the observation.

[Clayton Covington]
Okay and we'll take one more before we just wrap up so. Another question says perhaps a researcher isn't interested in all reports a child experiences but would rather collapse the child-level Collapse by the child level or collapse the child level by year and include an indicator for multiple reports is it best to use the stfcID for merging collapsing and linking the data rather than the chid plus State gender identifier. I've seen multiple ways of collapsing data.

[Sarah Sernaker]
I would say if you're wholly working within NCANDS and not working with AFCARS at all do not use the stfcID because that as I briefly showed in one of my tables some states just don't provide that and it's less it's just more imperfect in the NCANDS than the Child ID. So if you're just using NCANDS and for linking and all that, use the chid and the report ID but the chid to identify children. And the masking thing maybe they're concerned about the linkage breakage that happens in the AFCARS ID too so it's just it's not avoidable.

[CROSSTALK]
All right to the next SL use Child ID 

[Clayton Covington]
Can you go to the next slide Sarah? Yes so Andres actually just answered one more question about the slides you can find them in the chat now. As for next week we're going to continue the discussion talking about assessing reporting issues in AFCARS and NCANDS with Dr Alexander Roehrkasse who's a professor at Butler University and a research associate of NDACAN so you can join us again at the same time next week 12:00 Pm Eastern Time on July 17th. Until then thank you very much and we will see you all again soon.

[Sarah Sernaker]
Thanks all feel free to email me with questions (sarah.sernaker@duke.edu).

[VOICEOVER]
Tthe National Data rchive on child abuse and neglect is a collaboration between Cornell University and Duke University. Funding for NDACAN is provided by the Children's Bureau an office of the administration for Children and Families.

[Musical Cue]