[MUSIC]
[VOICEOVER]
National Data Archive On Child Abuse And Neglect.

[Paige Logan]
Hello everybody. It is 12 so I will go ahead and get started for today. Welcome to the 2024 NDACAN Summer Training Series. My name is Paige Logan. I am the new graduate research associate here at the archive. I'm taking over for Clayton. It's nice to be here with all of you. Before we get started just a few reminders. We will be taking questions at the end of the presentation but as you're listening please do feel free to submit any questions that pop into your heads using the q&a box at the bottom of the Zoom screen. Hopefully we'll have some time at the end and we can answer as many of them as possible but we will answer them in the order that they come in so feel free to yeah type those in there as the presentation progresses. This session will be recorded and slides are available for all of the Summer Series presentations on our website. And then if you have any questions or you need support with Zoom you can use that link on on the screen or you can reach out to Andres Arroyo who is the NDACAN archiving assistant. Next slide please. The theme of our series this year is Best Practices In The Use Of Ndacan Data and just as a reminder all of the information about the datasets in the archive are available on the website. We also have a list serve if you'd like to stay up to date on future offerings and events throughout the year. Next slide. So today's session will cover the NYTD data set including we'll talk about some strengths and some limitations and we will be hearing from our statistician Sarah Sernaker as well as Tammy White from the Children's Bureau. And the Archive is funded through the Children's Bureau which is under the Office Of Administration For Children And Families within the Department Of Health and Human Services. Next slide. This is actually our last session of our Summer Training Series you can see all the topics that we've covered since the beginning of july and again if you've missed any of these sessions and you're interested in learning more about any of the topics that we've covered all of those recordings will be available on our website and you can check those out at your own time. I also want to plug that we oop sorry I was just going to say I want to plug that we have our Monthly Office Hour Series coming up in September and that is a special series because we will be providing free R training for the first 30 minutes of each week followed by the typical office hours format. So that starts on September 20th and it runs through through May of next year I believe. And again all of that information is on the regist is on the website including the registration link. And with that I will turn it over to Sarah to get into our presentation.

[Sarah Sernaker]
Thank you Paige just to follow up on what she was saying about our Office Hours. So we'll be starting from the ground floor of R so if you've never used R this will be a great way to learn about it and just getting started from installing R and RStudio to packages and working our way up to advanced models by the end of the year. So we're really excited about that. But today our last summer series talk is about NYTD National Youth in Transition Database and we'll go through a little bit of background of what the data is and encompasses and then the strengths and limitations. And so the best person I think to give NYTD background is Tammy White from the Children's Bureau so I'm going to pass it over to her to give a little introduction on NYTD and a little bit of context on the background.

[Tammy White]
Hello everybody thanks for joining. As Sarah said I'm Tammy White from the Children's Bureau I do a lot of the management of this data set that goes to the Archive for you all to use. And just briefly for those of you who may not be familiar with NYTD it is a very somewhat new database although it came around 1999 it's a legislatively mandated database. It is came out of the Chafee Foster Care For Successful Transition To Adulthood which has gone through a couple iterations. Back in 1999 it started states began collecting it in 2011. It provides flexible funding to states that provide independent living services to young people who are currently and formerly in foster care and transitioning to adulthood. It also has a survey component to it which is to help researchers look at outcomes for youth who are in foster care or who've aged out of foster care. And it requires the law requires them to develop a data collection system for each state to capture information on both of these areas Outcomes Survey and an independent living services provision. The program like I said it serves people young people who are likely to remain in care until age 18 who will be exiting probably not to a permanent exit and the survey is from young adults ages 17 to 21. And they get surveyed at age 17 and that survey looks at six broad outcomes which is you'll see on your screen just financial education, some high-risk behavior, some homelessness. And then there's also a category for states to report independent living services which are a variety of services captured in these sort of broad categories academic, financial, some health and home management, those kinds of things. As Sarah goes through the presentation she'll highlight a lot more about that and so I will turn it over to Sarah because she's our main presenter today. Thank you.

[Sarah Sernaker]
Thank you so much Tammy for that introduction on NYTD. So again that's a National Youth in Transition Database and as Tammy was saying there's sort of two prongs and one main prong is the Services survey of outcome not actual survey but data collection of services received and the other prong is an actual survey of outcomes for children aging out of the foster care system. So that's sort of the background of the data collection and that leads to two data sets or two sort of data umbrellas. And the first main one that is usually of most interest when people order NYTD is the Outcomes Files. That's what what's called the Outcomes Files. And so this is survey data given to cohorts who are turning 17 at the time of data collection. They're in foster care the criterion is they're in foster care and they're turn 17 while still in foster care and they have no permanent placement setting and they and there's no permanent placement setting on the horizon. So this is youth aging out of foster care. And so there are a few cohorts of the survey. The first survey was given in 2011 and so that's what's called the 2011 cohort. And so in 2011, youth who were 17 in foster care were surveyed at wave 1 and they were asked a series of questions about services they were receiving, employment, incarceration, insurance things, which I'll get into more. And so they respond at wave 1 and then they're followed up every two years at ages 19 and 21. And so there are three so there's three waves for each cohort and there are three complete cohorts as of today and that's the 2011, 2014, and 2017 cohort. And there are two cohorts in process. So the 2020 cohort has had two waves by now. Only one wave is available with the Archive right now data is currently being processed. And then the 2023 cohort would have just received their first wave of data last year. So that data will probably be forthcoming in the next months or so. So each cohort when you order the data through us, usually you identify which cohort you want, usually it comes down to what time frame is of interest to you and each cohort comes in their own package. So if you ordered cohort 2017 you would get all three waves of data. If you were to order cohort 2020 right now you would only receive wave one because that's all that we have and it gets the packages are cumulative. So each package contains as many waves as are available to NDACAN so that's the Outcomes Survey so that is a survey as I said and I'll talk more about it. And then there's the Services Files. So Services Files runs from fiscal year 2011 to 2023 we actually just put out the latest Services File. And this is pretty similar to AFCARS collection in that it's about children usually in the foster care system but not necessarily and they're receiving some sort of service paid for and administered by the Chafee program. And so it's just it's similar measures as the AFCARS about just services received and a few and demographics and a few granular pieces about the child welfare experience. In the Services File if you were to order the Services File it's a cumulative data file so just if you're familiar with our AFCARS and NCANDS usually you get the data files year by year. This package is just a one stop cumulative file that includes all data from 2011 to 2023. And I will say we have these two prongs but the sort of misconception is that the number is that a only a small number of children who are surveyed in the Outcomes File can actually be found in the Services File so there's very small overlap. And that's because the Services File anyone at any the Services File includes children at more ages for instance, it also includes children who are not necessarily in out-of-home care so it's just more encompassing in a larger population. So only about 5% of those who receive services can actually be found in the Outcomes File. And so I think that's just an important point to make that if you were to utilize both which is very valid you just might not be able to link as many people as you might have expected. So the Services File. I'm going to start with this first because and then we'll move to the Outcomes so a Services File like I said is longitudinal records for all youth who receive at least one independent living skills so one of those 11 that I listed in slide two paid or provided by a Chafee funded agency, regardless of the foster care status and regardless of the age. So what I just mentioned. And data are submitted every six months so every six months data will be collected on who received services in the past six months and then data are submitted and compiled and added to our cumulative file. And so a child will have a record in for each six-month period during which they receive services. So something to keep in mind if you're looking to do analysis on a yearly level you would probably have to account for children who appear twice within the year. And again it's data on independent living skills such as academic support, career prep, employment training and other various mentoring services. So then we get to the Outcomes File and the Outcomes File as I said is survey data so there's only one record per child per wave and a child will appear in the data even if they didn't respond and there will be missing data across the board if they didn't respond and I think there is an explicit indicator that they did not participate. So the reason they're retained in there is to observe the demographic data so you still have those demographics that you can see the full N's and whatnot. So there's demographic data for all the baseline youth so anyone who is asked to survey even if they did not respond. And the data come in long format so when you order data with us for the Outcomes Survey if there are multiple waves so basically any cohort before 2020 will have all three waves, the data basically are stacked by waves and the questions the surveys are the exact same questions across the three waves. So children or youth are surveyed at wave one the various series of questions and those come out as variables and there's a variable in there that measures the waves. So wave will say one and then as new waves come in we just sort of append it on to the existing data and the wave variable then gets a value of two. So the unique combination of the child ID and the wave number will uniquely identify an observation in surveys that have multiple waves. So again in short it's in the long form format with a variable wave to distinguish waves. As I mentioned the first cohort was 2011 new cohorts are surveyed every three years. So within a cohort they're surveyed every two years so age 17, 19, 21, but new cohorts are started and new 17-year olds are surveyed every three years. So we have 2011, 2014, 2017, 2020, 2023. And so it contains demographic information about the youth and it's a lot of binary measures which is one of the points we'll talk about later. But the Outcomes File is really just a bunch of binary measures, questions about you know have you ever been incarcerated? Are you employed right now? Are you receiving this service? Has this ever happened? Yes as you can see on my side the homeless, incarceration, marriage, parenthood all the various things. And I just put this together I like visualizing this it's helpful for me just to see sort of the progression and timeline and I tried to arrange it so that it's sort of in a time sequence here. So we have the 2011 cohort this is just to show sort of which data we have. And it's hard to tell but our sort of dark bolded colors are data that are available right now through our archive. So we have all cohort 2011, 2014, 2017. And the 2020 cohort if you can notice the subtle difference wave 1 is available now you could order with us and we are in 2022 the wave two is forthcoming, and then the 2024 data collection obviously will be taking place this year. We don't have any data right now for the 2023 cohort as it just happened last year and between data deposits and cleaning and whatnot that will be forthcoming later. But yeah this is also just to show you I mean this data are just being continued to be collected and so the trove of information just continues to grow really. So as I mentioned briefly the Outcomes Survey is a survey so you might be wondering if you saw my survey weights who's our target population and what's the survey design? And so the baseline population so who is this survey talking about is all youth who reach their 17th birthday in the year and are in foster care within the 45 period following their birthday. So they're turning 17 and within 45 days of the 17th birthday they are still in foster care. And this is just you have to draw a line in it you have to draw a line in the sand somewhere and so this is the distinction that defines who is being surveyed. And this was identified because children who are you know turning 17 still in foster care have lower chance of reaching a permanent placement setting at that point in time if they're still in out-of-home care. And all youth in the baseline population so all youth who meet this criteria are eligible and required by law to be contacted and asked to complete the Outcomes Survey. This doesn't mean they necessarily respond but they are required if they fall in this criteria to be asked to complete it. And so no random sampling is done there's actually no sophisticated sampling frame or anything. The target population is defined as what it is and because part of the definition of the target population is these children are in foster care we can the state should be explicitly be able to identify everyone in the baseline population. They are under the care you know foster care so they're in the system. So everyone who meets this criteria is required to be contacted and no random sampling or any complex design is needed. So everyone is asked to complete it and as I said people don't necessarily always respond so the cohort is self-selected in that way. It's a nonprobabilistic sample of youth from the baseline population because they're sampling everyone and the cohort the people respond and people don't respond. And so you just based on surveys there could be different characteristics of people who respond and don't respond so that's why it's a nonprobabilistic sample. And as I mentioned so more explicit detail about the target population. To be in the cohort you have to be in the baseline population, youth is in foster on the day of the survey, youth participated in the survey and youth completed the survey. So this is this is what defines the cohort and the population who is followed up on. So this basically all this means is everyone in the target population has to be asked to participate in the survey, those who do respond then define the cohort itself. And those people are followed at wave two wave three. People who don't respond in wave 1 are not asked again in subsequent waves. So that's what this is describing here and the ex the formal definition of the cohort. So again while everyone is asked to respond, only those who do respond are then asked to participate in subsequent waves and that is the formal cohort. And so youth who complete the wave one they're followed up two and four years later with the same survey as I mentioned. And most youth in the cohort are eligible to take the wave 2 survey. I say most youth and I'll get into it in a second there's sort of one caveat, states can do a subsample. But unless if if a child is not in a quote unquote subsampling state then they are inherently eligible for wave two. So it's really just these states who do a sample. And so for states that opt to sample only there are let me back up. So in wave one everyone in the target population has to be asked to participate in the survey and some people then respond and some people don't. Those who respond in wave one are then the cohort and are followed up in wave two and wave three. Some states or actually all states have the option to then take a sample at wave two. And I was talking to Tammy about this and why and she told me this was to try to reduce the burden on some states you know especially the large states to try to reduce the burden on reaching these children and surveying them. Which is burdensome on workers and you know youth themselves. So states can opt to sample at wave two and they just take a simple random sample of the youth who have responded at wave 1. So again everyone's asked at wave 1 and then some states can decide to take a random sample of those respondents and that will then define the follow-up population. And only those subsampled at wave two in such states that's a subsample that will be surveyed at wave two and at wave three. So the subsample then defines sort of this sort of subcohort for both wave two and wave three. They don't do any further sampling at wave three, it's not a different sample of people it's only happening once at wave two and only in some states. And there are regulations that dictate the sampling frame, sampling method, and sample size calculation. I've included a link here that has more explicit details. 

[ONSCREEN]
Link to “Appendix C to Part 1356—Calculating Sample Size for NYTD Follow-Up Populations”, 
https://www.law.cornell.edu/cfr/text/45/appendix-C_to_part_1356

[Sarah Sernaker]
I say they dictate the sampling frame and sampling method but it's really no more complicated than a random sample. And they have to meet a certain criteria for a sample size so they can't just sample and get 10 people and move on. So they do need to meet certain criterion and thresholds if they decide to take a subsample. And as I mentioned sampling is done once at wave two and then that same sample is used for both follow-ups. So follow-ups meaning wave two and wave three surveys. And only youth in the sample are eligible for the follow-up. So even if you responded at wave 1 if you weren't in the subsample you wouldn't have been asked to take the survey again. And to give an example of the states who use sampling from the 2020 cohort as my bottom bullet here there were 16 states who used sampling. You can see Arizona, Colorado etc. Etc.. Tammy and I were remarking looking at this list you know we said it was an opportunity for large states to reduce the burden such as California Or Florida but you actually don't even see them partaking in the sampling. Which is just sort of an interesting piece that you know California has the resources still to complete their full sample. Anyway so this is the 2020 cohort previously I had listed out the 2017 cohort and it was really the exact same except for Arizona. So I think this list is pretty consistent across most of the cohorts. So in the survey administration and as is the case you with all of our administrative datas each state is its own sampling or uses its own sampling frame or methodology. Each state has discretion to choose the methods used to administer the survey so whether they want to do this online, over the phone, whether they're sending people out. Maybe some states do additional outreach or followup versus other states to maybe just send you a single call and if you don't answer that's it. So it's very different state to state of how the survey is administered. Other things to note about the survey administration no one can answer for the youth. So this is not a parent or a caretaker responding for the youth. And no and data nor can data from other sources be used to answer questions. So this is so researchers or the survey collectors can't just fill in the blanks maybe they even know and they know this fact about this child that they can't the youth have to answer for themselves bottom line. And participation in the survey is completely voluntary on the part of the youth. And so if you're familiar with surveys there's a few things here that might cause concern. We're talking about 52 different survey administrators, voluntary response. But you know this is really the best survey out there as far as youth aging out of foster care and there are just limitations that go along with it and that's the whole point of what we're going to be talking about today. But it's just good to keep these things in mind especially when you do research on a national level want to compare results and outcomes between states. It's definitely a real consideration about simply the differences arising from survey administration or just differences in states. As is always the case in admin data. So there are just want to talk briefly about response rates. This is also in the users guide which I should have said at the beginning you should definitely check out all of the documentation, the users guide, codebooks even using this today I always pull up the codebook anytime I'm using the data just to remind myself of the definitions and whatnot. And one of the things in there is a discussion about response rates so I just wanted to briefly talk about it. At wave 1 the response rate is pretty intuitive and standard and that's just the numerator is the number of youth who responded to wave one and the denominator is the number of youth in the baseline population. So you can see and I have a visualization in a few slides, this is a good sense to see if you calculate response rates yourself to see how well a state maybe has surveyed or collected data. It's just a measure of you know who responded and how that might vary between different demographics or characteristics. The response rates at wave two and three there are sort of two ways to think about it. The numerator always is the same as the number of youth who responded at that wave. But then there's two sort of response rates you could consider and method one we describe uses the baseline population as the denominator. So this is at wave two the proportion of people who have responded at wave one and wave two basically. And the second method is more a measure of attrition to that wave. So the numerator is still the number of youth who responded at that wave, but the denominator is the number of youth who responded at survey wave 1 and were eligible for subsequent surveys. So it's as I said more a measure of attrition because the denominator is the population who actually responded from wave one so it's not the full target population or the baseline population but it's saying they responded at wave 1 and what proportion of those people came back for waves two or wave three. And I should mention if a child responded or a youth responded at wave one, they could have been asked to participate in wave two and maybe didn't respond, they would have still been asked at wave three. So it doesn't matter if you responded at wave two or not you will be asked. If you responded at wave one. You will be asked to participate in both waves. And so just reading my notes any youth in the court from a state that didn't sample is eligible for sampling states. Only the youth in the sample should be included in the denominator and this response rate tells you the proportion of youth who responded among those who are eligible. So these rates are just interesting to think about and it's just a matter of what population with respect you want to frame your proportions. And these sorts of things are interesting to give a sense of state participation and survey success I guess. How many children have they surveyed? How many children or youth can they retain across the waves? And then further you can look at response rates by demographics. So attrition and overall response rates might vary based on race or sex and sometimes that can be informative in research to understand maybe why differences are arising across such groups. So after going through all the background let's talk about the strengths and then limitations of using NYTD. 

[ONSCREEN]
Link to NYTD reports and summaries on the Children’s Bureau website:
https://www.acf.hhs.gov/cb/research-data-technology/reporting-systems/nytd 

[Sarah Sernaker]
And as I've mentioned sort of with all of our data sets it's the best available data on the subject and that's just you know a matter of fact. There's no other survey that's explicitly measuring outcomes of children aging out a foster care. And this particular population is of main interest well for the Outcomes Survey I should say because that's the youth aging on a foster care because they're especially prone to poor outcomes and need a lot of support because they just don't have a permanent placement setting or you know that stability of a home. But that's so that's where the outcomes and then the services received by Chafee this is just the only measure to understand sort of how the surveys the services who's receiving the services and over time too. So there's many years of data so for long-term tracking or longitudinal analysis to track changes over time such as trends or historical impacts such as Covid. And at this point in time there's multiple complete cohort data. So as time continues to go on we will just continue to accumulate full cohorts of data. And so we'll be able to not only see changes happening within cohorts but changes over time between cohorts and we'll only continue to increase the power on those sorts of analyses as more surveys are collected. And as always is the case with any of the data NDACAN archives we offer lots of data support and resources especially for NYTD. I was looking myself at the website that links past summer series training webinars and a large proportion are dedicated to NYTD so there's a lot of good stuff on our website and not just our website but Children's Bureau website has a lot of really great resources. Another big strength of NYTD I haven't explicitly gone into the variable but we do use the variable the foster care ID which is consistent with the foster care ID found in AFCARS and NCANDS. So it is linkable to our other data sets with some limitations as more detailed in the NCANDS and AFCARS Summer Series, just broadly issues with linkages and state encryption situations. But the last strength I've listed here is that states actually use this data to inform policy changes of funding needs, to provide better services, and to understand where to target people in places or states that need more attention. So that's sort of the theme and vein of all of the administrative data sets. And really I think the hugest benefit is just to understand like how to best use resources so they're not being wasted and that those in need are receiving more care and attention that they seek. Another big strength of NYTD is that there are sort of these data reporting standards. Sort of in contrast to NCANDS and AFCARS where I mean AFCARS has an explicit list of variables that need to be measured but state definitions and policies and whatnot sort of can more deeply affect how those variables are measured. But with NYTD there are data reporting standards, states must comply with the standards or they face penalties. And these things include standards such as file format requirements and a certain threshold of errors and missingness. So 90% error-free for other data elements so completeness, internal consistency and not just sort of random nonsense or just a bunch of no's everywhere. And they must provide full or partial Outcomes Survey on all 19 and 21 year olds in the follow-up or indicate why. And they have to garner the participation of the Outcomes Survey of at least 60% of 19 to 21 year olds in the follow-up. So I've listed all these out and Tammy and I were having the conversation because I thought 60% seemed pretty high to get it's basically a 60% attrition we're talking about and in any survey that's a sort of high bar medium to high bar I'd say. And so Tammy was telling me a lot of states actually do just face penalties and they just accept the penalties because surveys are it's hard to get people to respond to surveys and so states may often not meet all of the standards explicitly. But they are always sort of pushed to do better. Okay so the strengths and now we have the limitations of NYTD. I find the biggest limitation when working with NYTD is that the there's just not much granularity in the data. A lot of variables are pretty simplistic binary variables whereas I said it's just basic measures of stuff like were you in have you been incarcerated since the last wave? Have you gotten married? Are you employed full-time? Are you employed part-time? Are you enrolled in school? So very sort of basic yes no questions. And while that's informative on its face usually I find researchers want to dig deeper and that there's just not much more depth to it than that. So it gives a very sort of high-level summary and definitely shows trends and information but I always find I want more. So I think that's a sort of limitation of NYTD. Another one is that there some states have low sample sizes and response rates. As I said each state is its own data collection system and so some states are better at it than others. Some states have pretty low sample sizes and so if you wanted to do just a state-wide analysis and you have less than 50 people let's say or even less than 30, you're just not going to have much power. And so that's can be unfortunate in some cases. Another limitation only a small proportion of youth in the Services File are in the outcomes and I it's more of a fact but I've put it in this limitation just because I think there's a sort of misconception that everyone in the Outcomes File can be found in the Services or vice versa but really there's just a small overlap. And again it comes down to the fact that Services File incompasses more people by definition and it encompasses people who again are not necessarily in out-of-home care or foster care, but they are receiving Chafee services. And so it's just a difference and how the data are collected and who it's covering. They are linkable but again just you're not going to match everyone from the Outcomes in the Services File. Another limitation as always with our administrative data sets and sort of the most serious one that should always be considered in really any administrative data there are just state differences in statutes and definitions which affect how data are defined or you know sort of child welfare outcomes are defined or collected. In this instance it probably more so comes out and how data are collected in the sampling procedure in the survey procedure rather and how each state manages it on their own. And there's similarly there's state-to-state variation in child maltreatment laws and information systems. And so simply the computer system in which state used to input and collect the data ultimately vary and may affect the interpretation. I'd say you know these issues are definitely much more common in NCANDS and AFCARS I think. But still definitely present here. 

[ONSCREEN]
Figure of three histograms stacked on top of each other showing the distribution of respondents over all states in 2011 (orange bars), 2014 (green bars), and 2017 (blue bars). 

[Sarah Sernaker]
So this is my sort of break from all of our texts I have some figures here so this figure is to show how many states have a sort of bucket it's to show the sample size how do I say this? So the number of there's the sample size within each state and how many states have sort of how many how many people each state is surveying? So let me just explain here so by cohort and on the X-axis we have number of respondents and then we have these buckets that counts the states. So for instance in the 2014 cohort about 11 states had less than 50 people in their wave 1 survey. And it doesn't get any better than wave one so that's why I stuck with wave one because you take your sample in wave one and really the numbers only go down from there. So each of these bars represents they're in buckets of 50 so again this is saying that over 11 states have between 50 and 100 I'm getting cut off. And the explicit details are not so important but the trend is to show that most states only have a sample of about 100 or 200. And you know in the grand scheme of data collection sometimes that is not sufficient for some of the advanced modeling and it just is low power analysis has lower power the less people you have. And this was just to kind of give a sense that you know the scope and the scale of how many people respond. There's few states out here you know one state is actually collecting data on close to 2400 people probably California I didn't look it up. One state's collecting data on over a thousand. So those are you know like samp size numbers you know higher sample size is obviously better. This was just I think an interesting thing I wanted to look at. But moving on another limitation that comes up with all with our other admin data is masking and suppression. So NYTD does undergo a few bits of masking and suppression for confidentiality and disclosure risk purposes. And the masking that's done in the NYTD files closely mirrors what's done in the AFCARS. And so county identifiers so county fips codes are masked based on the AFCARS masking of the same year. And so what that means is if we're looking at the 2017 wave 1 data then the counties would be masked within the NYTD file if they were masked in the 2017 AFCARS file. So a county would be masked if they had less than 700 records in the corresponding AFCARS file of that year. So a little bit convoluted but and it's explicitly written in our users guide with probably more clarity but that's just to say the number of record that dictate the sort of masking procedure comes from AFCARS. And yeah so that's just to say counties are masked. This almost always eliminates identifiability of small counties and rural populations which is sort of the bottom takeaway bottom line takeaway. There's also date masking. The date of birth is masked to the 15th of the month and any other dates are shifted similarly. I'm just thinking I forgot to look at the codebook I always have to refer to codebook. I don't think there are any dates actually in the Outcomes Survey so that would be more so relevant to the Services File. I can't remember if date of birth maybe date of birth is in the Outcomes if date of birth is not in the Outcomes then birth then age would be. 

[Tammy White]
Date of birth is in the Outcomes File.

[Sarah Sernaker]
Thank you Tammy. Yes yeah so we have county masking date masking it's very analogous to the AFCARS masking procedure. The other limitation comes from the survey limitations. As mentioned youth self-respond it's not they're not selected randomly therefore there's no guarantee that the cohort is representative. I'd say mostly is and there are actually survey weights which I think I have a slide on. I don't know if I talked too much about the survey weights there are survey weights that are available they're sort of simplistic. I mean they do go under undergo post-sratification and all the sorts of weight construction and weight trimming and whatnot. So my point is even though youth self-respond there are ways to sort of adjust for that to make the sample representative of the broader population. But you know self-response as I said people who self-respond and choose to respond to a survey might have just different characteristics and demographics than those who are not responding. Also you know some youth just can't respond in later waves because of they're just unavailable for various reasons. They might be incarcerated or they might be in the military and we don't actually have a way to explicitly measure why they refuse to respond and so you know in those sorts of ways you can think that it's not representative of people who enter those pathways or whatnot. So again it's just sort of a limitation. All surveys have limitations so it's just something to keep in mind. Again states administer surveys differently which may lead to varying survey design bias. Some states might have more you know bias built in than others. And then there's wave nonresponse due to various reasons. Rates can vary by certain characteristics. Some states require parent permission to respond for youth not of age. I did mention that youth all respond for themselves so I think this is actually not super prevalent. I think that must anyway. 

[ONSCREEN]
A figure of 51 adjacent graphs indicating the proportion of baseline population responding at each wave by cohort for the 50 U.S. states and the District of Columbia. 

[Sarah Sernaker]
So another figure that I created here this was to show survey response rates out of the baseline population. So the x-axis is showing the different waves. So this is all survey response weights with respect to the baseline population. So this method one of what I had outlined previously not attrition but with respect to the baseline population. And those states in the solid line are states who do not do a subsample and states in the dash line do a subsample at wave two. And this is and then there are the various cohorts so 2011, 2014, and 2017 so I just included the data that are complete. So a few interesting pieces. Naturally the first wave is going to be the highest people just drop out of every people tend to drop out of surveys as waves continue and that's just a characteristic of all surveys. But other points to note are just you know interesting comparisons of states. Arizona is notoriously bad and there's are they just have poor response rates I don't know I don't know I can't speak to anything further than that. What we're observing here is that they have very poor response rates. Texas you can see also a significant drop but they do sample so part of the drop is just because of the fact that they're taking a sample of people and then surveying that subgroup. So yeah just I just find these sorts of things interesting, especially the comparison between states, and it gives a sense of you know who just the survey administration and sort of how good of a job they seem to be doing and are they improving over time. So if that were the case we'd expect to see you know the blue line on top of the green line on top of the red lin which means you know ideally they're improving their surveying techniques over time. So Rhode Island seems to not be improving while maybe New York is improving and collecting more people as they collect data on new cohorts.

[ONSCREEN]
A figure of 51 adjacent graphs indicating the proportion of response from subsequent waves by cohort for the 50 U.S. states and the District of Columbia .

[Sarah Sernaker]
And then just as a quick comparison this so the first one was with respect to the basline population, this is with respect to the previous wave, so this is really just attrition to each wave. And so again this looks more stable this is a more stable measure because it's just the the denominator is I mean is smaller it's just smaller and so that's why these look higher but the sort of trends are interesting also. Again Arizona. It looks better so the people they do sample seem to be staying in the survey. It looks like 2017 actually they didn't do as such a good job. But yeah these things or you could even look at survey response rates by race or by sex and again that will give you a sense of who the survey and data you have in front of you is sort of best representative of. The other limitation that I've listed with the other data sets as well, because this data set has the AFCARS ID that's consistent with our other data holdings and that is the variable and ID that's linkable between our data holdings. So it's the same problem we find here. There are these quote unquote weird characters and I just kind of noted it because they can cause problems in some programming languages which can interfere with linking but if you're not linking this usually doesn't doesn't cause too many problems but it does exist. There are additional considerations I tacked on to the end and again consistent with our other data holdings and that's just because admin data sort of behave similarly. You should always do state by year exploration of any variables used for research and that's just to understand missingness, reporting patterns, or discrepancies. And that could just be frequencies running crosstabs or doing basic visualizations. This is just to get a sense of it, it's not to say that you need to do a whole research project on top of your research project, but you should really have a sense of what could be driving differences in your analyses. And sometimes differences in analyses are simply driven by state-level differences in data collection or measures or definitions. And so also to that point I would suggest considering multi-level modeling, if that's relevant you know to your research goal and question. Multi-level modeling we have resources from our website, Frank Edwards has done a few talks about multi-level modeling it's just a great way to account it's to state in your model "I know there are differences across states and years" and it's just a good way to account for it. Also refer to published reports or publications, state footnotes for additional information on each state's reporting. Utilize what you can. Look at each state contact states if you have to if you're doing state-level analysis. Sometimes I say try to directly talk to the state and really understand what's happening at that level. And then always seek assistance. That is clearly the whole purpose of NDACAN is to help users and on top of distributing our data, but we love helping users and making sure that you can use our data and understand it appropriately. So there's us. Children's Bureau has tons of resources not just on NYTD but on AFCARS and NCANDS. And just look at literature out there what's you know what variables are used what are people describing as limitations they came across in their research things like that. And NDACAN has a Zotera database of citations or literature that uses NYTD that uses any of our data set but there's a tag for NYTD and so you can use our service it's called canDL to directly see a lot of literature that explicitly uses the NYTD data.

[ONSCREEN]
The email address of Sarah Sernaker is Sarah.Sernaker@duke.edu.

[Sarah Sernaker]
And with that I'll stop there for questions I think I've left a decent amount of five minutes but feel free to shoot me an email after this on this talk or any of the others you might have seen this summer and I'm happy to answer any questions.

[Paige Logan]
Thank you Sarah and thank you Tammy. There are no questions in the q&a chat just yet,

[ONSCREEN 34

[Paige Logan]
But while we're waiting for questions I'm going to do another shameless plug for the R training and monthly office hour series. I put the link in the chat for that will take you to the website with all the information. It has details about what will be covered on at at each session as well as the registration link so please do check that out. I know I'm excited for it.

[Sarah Sernaker]
Yeah I think it'll be really fun. I love R. All those visualizations you saw were put together in R and as I said we're gonna be going from the ground floor of like installing R and RStudio so this is again a really great way to learn R if you've always wanted to.

[Paige Logan]
Or if you're like me and used it in undergrad and then never again and need to brush up on your skills.

[Sarah Sernaker]
Yes or maybe you're an R user but you haven't used Tidyverse so that's something also we'll be we'll have a whole session on Tidyverse. I'm definitely a convert to the Tidyverse life. I was a base R all through grad school I will say, and then you know once I got over the learning curve of Tidyverse I will say it makes life super easy. So we'll be diving into Tidyverse and ggplot visualizations.

[Paige Logan]
I still don't see any questions so maybe we'll give it another minute we're also coming up on the hour so if folks have questions after like Sarah said please do reach out.

[Sarah Sernaker]
All right should we call it?

[Paige Logan]
Yeah I think that sounds good. Thank you everyone for your time. Thank you again to Sarah and Tammy for presenting and hopefully we will see folks in the fall.

[Sarah Sernaker]
Yes thank you all for joining us and for the joining us for the whole summer series if you made it to other talks. This was our last one of the summer. Thank you.

[Paige Logan]
Bye everybody.

[Sarah Sernaker]
Bye.

[VOICEOVER]
The National Data Archive On Child Abuse And Neglect is a collaboration between Cornell University and Duke University funding for NDACAN is provided by the Children's Bureau An Office Of The Administration For Children And Families.

[MUSIC]