[MUSIC]
[VOICEOVER]
National Data Archive On Child Abuse And Neglect.

[Paige Logan]
Hi everyone my name is Paige Logan it is one minute after the hour so we will get started. Welcome to the 2024 Summer Training Series hosted here at the National Data Archive on Child Abuse And Neglect. My name is Paige Logan I am the new graduate research associate at NDACAN. I'm taking over for Clay Covington so you'll probably be hearing my voice and seeing more of me on the list serve and at our training series and monthly Office Hours starting in the fall. I'm really excited to be here with everyone. Before we get started just a few reminders. If you have any questions throughout the presentation please use the Q and A box at the bottom of your Zoom screen and we will answer as many as possible in the order that they come in at the end of our time together. All of our sessions are recorded and the slides and video for this summer's presentations are available on the NDACAN website. If you have any questions or you need support with Zoom you can use the link on your screen here or reach out to Andres Arroyo the NDACAN Archiving Assistant. Next slide please. So if this is your first time joining the Summer Training Series welcome. The theme of our series this summer is "Best Practices In The Use Of Ndacan Data" and we're hoping to share some key considerations as well as tips and tricks for both new users as well as folks who may be more familiar with these data sets. Next slide. Our session today is called "Approaching Nscaw 3 For New And Experienced Users" and we're excited to welcome our friends at RTI International who will be presenting today. If you were not aware NDACAN is funded through the Children's Bureau which is under the Office Of Administration For Children And Families within the Department Of Health And Human Services. Next slide. We are actually almost finished with the Summer Training Series. This is session five of six we've had sessions on both NCANDS and AFCARS data sets we had a session on survey design and using weights and today we'll talk about NSCAW 3. After today we have one more session on august 14th and that will cover the NYTD data set strengths and limitations. With that I will hand it over to Keith from RTI International to take us through the presentation today.

[Keith Smith]
Hello everyone my name is Keith Smith and I'm from RTI International. I'm the Associate Project Director for NSCAW and have been on the project since 2005 so going on 20 years. Just wanted to review what we'll be covering in this presentation. We'll start with a brief overview of NSCAW then we'll talk about the generation of your analysis files and analysis variables. We'll then outline some procedures you can use for qc'ing your analysis variables. Then we'll discuss some ways you can manage and document your an analysis files and variables. And finally we'll review some analysis issues that you may encounter. Next slide. So for a little bit of a background on NSCAW this table basically is a summary of the three NSCAW cohorts. The studies the NSCAW one cohort began back in 1999. It there were 5,501 one completed interviews it was conducted the baseline round was conducted with children age 0 to 14 and as I said it was from november of 1999 to december of 2000. For NSCAW 2 we had 5,872 completed interviews with children age 0 to 17 and a half and the baseline round was conducted from March of 2008 to April of 2009. And then finally for NSCAW 3 which is the cohort that we're primarily going to be discussing in this webinar, we had 3,298 completed interviews again with children age zero to 17 a half, and the baseline round was interviews were conducted between August of 2017 to March of 2022. And just a couple of notes there was something in the NSCAW one cohort called the Long-Term Foster Care Survey but it was only conducted in NSCAW one. And then for NSCAW 3 just want to mention that because of the Covid pandemic baseline data collection was paused for a pretty long period of time from March 2020 until May 2021. Next slide. This table indicates what instruments were administered with what types of respondents for the three cohorts. So you can see which respondents we conducted interviews with. I guess the just wanted to note that for NSCAW 3, as opposed to the first two cohorts, no teacher survey was administered and then also in NSCAW 3 we had agency director surveys that were completed as part of a separate set of workforce surveys that you'll be getting more information about in the upcoming months about that about the workforce data sets. Next slide. So we also wanted to put together a slide that has some important documentation for the study. If you're interested in receiving the NSCAW data sets and documentation you should contact NDACAN there are also instructions for applying for the data on NDACAN's website. And then there's three documents that you may be interested that kind of give you an overview of NSCAW that are on the on OPRE's website. There is a crosswalk of the constructs and measures across the three cohorts so it gives you a good overview of what kinds of measures we collected in all three cohorts. It's a fairly lengthy document but there's some interesting tables that kind of show what was administered across the three cohorts so that's the link to that. We also recently published the NSCAW Three Baseline Introductory Report which has a lot of good information about about the NSCAW 3 baseline wave. And then finally on the OPRE website the NSCAW web page there if you go to that there is a listing of several products for all three NSCAW cohorts that you can see and view that you might find helpful. So with that I'll hand it off to Marianne who will discuss and go over the rest of the presentation.

[Marianne Kluckman]
Thanks Keith so like you said my name's Marianne Marianne Kluckman. I also work for RTI I have not worked on NSCAW as long as Keith has. I've only worked on NSCAW for a couple years. I'm a SAS programmer and I've been at RTI a long time. So to that note when I show I am going to be showing snippets of code and output on the screens and it's all in SAS so apologies to those of you who don't use SAS. They're not comp it's not complicated so you'll be able to follow but you'll be able to do you know similar things in your software package of choice. Today we're going to be talking about NSCAW data from creating analysis file to up to presenting results, and some of this information is going to be specific to NSCAW but we've also thrown some just general overall data management practice, suggestions, comments in there as well. So the first thing to know about NSCAW re: data is it's pretty complex. It has a complex sampling design so we'll have to take account for that. Like Keith mentioned there's multiple surveys all of which which have complex skip patterns in them. So you've got three different kinds of information coming from a child, a caregiver, and a caseworker instrument. So on one record you could be missing one or two of those instruments, you could be missing modules within an instrument, you could be missing specific items that got skipped, or were not answered. And then there's derived variables that are also included. All of this documentation is included in the DEFUM and the appendices to the DEFUM and then you have links to the electronic code books. So when you get the data you'll have all of that documentation that comes along with it. Just a note on the complex sampling design. If you have used if you're an NSCAW 2 user, the NSCAW 3 sampling design identification is slightly different. You don't have to identify a straa just a psu. So that's one difference between the two because of the different sampling procedures. So these are some of the topics we're going to talk about as far as analysis population, analysis file when you're creating your analysis file. First off where do you get your data? Where is it? Where does it reside? What questions come from where? All of the instruments are separated up into modules. And the modules have two character digit names and those character digit names are used within the variable names. Now you see that for example if we wanted to look at maltreatment that topic is discussed in three different instruments. So if you are looking at maltreatment as one of the variables that you want to look at you're going to have to look in multiple places. They could be the same kind of information. It could be the exact same kind of information like what's the child's age or what's the child's race? Or it could be something the same topic but slightly different different questions to get at different things. You have to look in multiple places. The second thing to think about is what kind of population are you focusing on? Some assessments were done on specific age kids like younger kids versus older kids. Some questions are only asked of older kids. Not everything will you have for every population. Kids that were caregivers who were out-of-home caregivers, foster caregivers are asked different questions than kids' biological parent caregivers. You have to sort of figure out who got asked what when you're f focusing on the population that you're interested in to make sure you have that information to move on. Even if you are focusing on a specific population like let's say kids who are an out-of-home situation you don't want to subset your analysis file to that population. And we're going to talk about that more a little bit later on its specifics about why you don't want to do that but you want to keep every record in your analysis file. But you do want to keep just the minimum variables. There's lots and lots of variables in the file. You don't want to be including hundreds of variables that you'll never use. So you want to subset to the variables that you are interested in. So like I mentioned before there are derived variables included in the data sets that you will get. Where is the information on that? Well Appendix 2 of the DFUM is where you'll find information on all the derived variables. It not only has the the actual programming code but there's, for a lot of the variables, an explanation of why we're doing it the way we're doing it or specifically on scales and assessments you know some background information on that scale. That like I mentioned the scales and assessments are created so you don't have to do that piece. Variables that we thought would be commonly used in analysis are also created for you, like race and ethnicity. They are created information on that is in more than one spot. More than one person may be giving information on that. So variables are created either they're using multiple variables or they're across multiple instruments. And again all of that is documented in the DFUM. So sometimes you're going to have to create your own variables right? I mean not every derived variable is going to be there for the ones that you're interested in. So again you have to look and see if that information is available in more than one place. You also have to look and see if there's any skip patterns that are coming up to that are going to influence who gets asked the question you're looking at. And the last thing I want to mention even if the variable is there or the derived varible's there or the raw variable is the only thing that you need, there could be multiple categories you need may need to collapse categories. Specifically because and we're going to talk about this at the very end of this presentation when you're reporting you want to have at least 11 kids within a category to be able to report it. So sometimes you have to collapse categories in order to get that number. All right now we're going to move on to actually talking about creating variables. So again and I said this like three times already information is available in multiple spots. You could get it directly from the child, the caregiver, and the caseworker and those the Y the P and the C there though at the beginning of the every variable name it has one of those digits and that tells you what interview it came from. So any variables that start with the Y is coming from the youth, the P is from the caregiver, and C is from the caseworker. Do if you use information from let's talk about kids a child's race. If you used what the caseworker reported for the child's race first, when you're looking at it, you prioritize that piece of information over the youth, you will get a different answer than if you prioritize what the youth said. If it's reported in multiple spots you need to decide what you're going to prioritize. In other words let's say you're looking at race and you're going to report what the youth said first and if that is unavailable then you're going to go to caregiver and if that is unavailable then you're going to go to caseworker so that you can get a race for every single child. But which order you prioritize those in will give you a different answer. It's all about documentation right? Whatever you decide for when you're creating a variable you need to make sure you document it so that you can report on it later when you're describing the results of that variable. I talked about missing values. Obviously we don't have a value for every question they can be skipped or it could be missing for many reasons including a skip pattern. It could be the respondent didn't know the answer to the question, they refused they didn't want to answer the question. It could be that we don't even have an interview for that like pretend we don't have a caseworker interview for that child. There's several reasons why they're missing but they're documented with special missing values within that within the data. So for example and these are all listed in the documentation again that you get this is a screenshot from the electronic code book for this question PRS14. So the P in the variable name we know it comes from the caregiver. The SR is the module. So this question is about job-related services and you can see that there's some negative numbers for value so we have negative 7, negative 6 and then we have one and two one and two for yes no that's pretty obvious. The negative values are these special missing codes. This is a numeric variable so any negative value is going to be a special missing code and those codes are consistent across variables. So negative 7 is a legitimate skip in this and you're going to see that in other variables as well that the negative 7 means this question was skipped based upon an answer to a previous question. There are other special missing codes like negative 4 and negative 2 that just didn't weren't applicable to this one that I have on the screen here. But you can see the negative values for the special missing codes and those codes will become important when you're creating variables. You need to pay attention to what the special missing values are. And again this is all documented this is an actual screenshot from the codebook. All right, so let's do an example here because I think it's always helpful to see like actual stuff. Actual is not exactly true, the variable names are correct we are going to be using dummy data here so we're not showing actual results or you know percentages from the actual NSCAW data but the variable names themselves are the same. So we're trying to answer the question what percentage of children were referred for dental care? It seems pretty straightforward right? 

[ONSCREEN]
Screenshot of text of the question of a survey item.
Table indicating potential responses to a survey item where "N" is no, "S" is legitimate skip, "U" is non=interview, and "Y" is yes.

[Marianne Kluckman]
so this is a screenshot from the caseworker files and we know that because the very name starts with a C. So C from caseworker and then the CI is the module and this question is what kind of services did you get? And this one is particularly about a dental exam. So they were referred for a dental exam. And you see there's four options an N S U and Y. The N and Y are pretty obvious no and yes so this is a character variable. And the special missing codes for this one are S and U S for legitimate skip and U for non- interiew. So this is the variable we need and it's the only place that it talks about dental. 

[ONSCREEN]
Example table from NSCAW III showing depicting the weighted frequencies and percentages of dental exams in the dataset.

[Marianne Kluckman]
so if I do a weighted frequency on that you can see that's 13%. So 13% of kids have a value of yes they had they were referred for a dental exam. So that could be our answer. But if we look at the values we've got the S and the U in there so they weren't even asked this question the caseworker wasn't even asked this question about dental exam. So maybe we shouldn't include those. So let's get rid of the S and the U and just give the percentage based on who was asked the question. 

[ONSCREEN]
Example of code syntax with the following text:
if CCI17ATG 28='Y' then Dental=l; /*dental*/
else if CCI17ATG 28='N' then Dental=O; /*no dental*/

Table indicating whether a child received dental services.

[Marianne Kluckman]
so we're going to create a variable called dental and it's going to use the variable we just talked about CCI17ATG_28 and if that value is Y then dental is going to become one and if that value is N then dental is zero. And this is again a weighted frequency and now you see the frequency jumps up to 39%. So is this the correct answer? So before we had 13 and now we've got 39. So I'm not sure. Let's take a second look at those special missing codes because maybe we should be including somebody because that's a pretty big difference between 13 and 39. 

[ONSCREEN]
Table indicating potential responses to a survey item where "N" is no, "S" is legitimate skip, "U" is non=interview, and "Y" is yes.

Table indicating whether children received dental services, including the missing codes "S" for legitimate skip and "U" for non-interview. "N" indicated no, and "Y" indicates yes.

[Marianne Kluckman]
so let's look at the U's. So again this is from the caseworker. So the U's represent almost a quarter so about a quarter did not even have this survey done so I don't know that we can make any decisions about them if they didn't even they weren't even asked the survey or didn't get to this part of the survey. So what about the S's? The S's were legitimate skips. So it's almost half of the people were skipped out of this question based upon their answer to a different question. So let's look and see what that question was because maybe it'll give us more information about dental. 

[ONSCREEN]
A close-up of a question. CCI16A: Services provided/arranged for family. Text of this Question or Item: Regardless of the case decision of the investigation/assessment, have any services been referred for, provided to, or arranged for the family? Referring the family for services includes suggesting to the client that services may be needed, or giving the client provider contact information. Arranging services for the family includes contacting a provider, completing the necessary paperwork, and/or making an appointment.

[Marianne Kluckman]
so this is the question that caused the skip and it basically asked it's a long question but it's have any services been referred to or arranged for the family? So if they said no to this then they they skipped the question about dental. So the question we have to answer is, do we want to include any of these people in our denominator? Because that was what we're trying to find. The numerator is easy right? It's who said yes on the dental exam question but who do we include in the denominator? So if they said no to any service, then obviously it's no to dental. Okay? So we our denominator we want to look at all kids because that was our original question. What percent of kids were referred for dental care or had a dental exam? So I feel pretty confident that if they said no here that there's going to be no to dental and we want to include them. Now let's say you wanted to look at the percent among those only getting services. Then that would be a different denominator and that would be your 39% but we don't want that. We want to include everybody in there or as many people as we can right? We're not going to make any assumptions about the people who didn't get the interview. We want to include these people who said no to any services. 

[ONSCREEN]
Example of code syntax with the following text:
if CCI17ATG_28='Y' then Dental=1; /*dental*/
else if CCI17ATG_28='N' OR CCI16A=2 then Dental=0; /*no dental*/

Table indicating whether children received dental services excluding missing data codes.

[Marianne Kluckman]
so we're going to revise our creation of our dental variable and this time the people who said no who didn't get dental, it's either going to be the ones who said no to our original dental exam question and now I'm adding in the no's who said they didn't get any services. So that's two is no it's a numeric variable and when we include that and run our frequency and now you can see our percent is 17. So you have to look at the special mystic codes to see who was skipped, to see who your denominator is. The numerators are a lot more obvious than the denominators but you can see they make a pretty big difference, we went from 13 to 39 to 17. Whatever you do you need to document who's in your numerator and who's in your denominator. So you can include that information in your final report or your paper whatever when you're talking about who is you know in this case receiving dental services. Obviously there's lots of way to QC, QC a created variable. But the bottom line is everything should be QC'ed. You could have another programmer look at your code for errors, you could take a sample of records and look at the raw variables and then look at the created variables to make sure that it's being coded in the way that you want it to be coded. You could take all of the data and do a list crosstab of your raw variables by your created variables and that's my favorite way because you can get to look at every combination. 

[ONSCREEN]
Example code with the following syntax:
proc freq data=&dsout;
tables cci16a*cci17atg_28*dental/list missing ;
run;

[Marianne Kluckman]
And again this is in SAS but there's the similar kinds of things in other software packages where you can do and this is an example of what we did to QC the dental variable that we just created. So we have our two raw variables and then our dental created variable and I'm using the list option so it gives me it in list format instead of like multiple tables and I'm including the missing option so I can see even the missing values. I want everything included in tables because the ones that are missing are the ones that tend to cause problems. 

[ONSCREEN 31

[Marianne Kluckman]
So here's the output. I've got the three variables crossed by each other. And I always I my first focus is always looking on okay who didn't get a value for my created variable? And in that case it's these first two rows, so let's look at them. So that the very first row is they were skipped out of both of the questions, so I'm not including them. The second row is they didn't even they weren't even asked those two questions because there wasn't the interview was not done for the caseworker, so I I don't want a value of dental for them. And then okay so who did get a value of yes for dental? Well it's someone who said yes, I was you know they were referred for any services and yes it was dental. And then the rest of the people got no's either they weren't referred for any services, or they were referred but they didn't get dental. So by looking at this I can see every combination and make sure that whatever combination it is, it gets me to the result that I want. Now obviously this is pretty small right? It's a small table we've got two variables and one created variable and the raw variables have a maximum of four categories. A lot of times you're going to be creating variables that have 10, 12, 15 I don't know how many input variables, raw variables that you're going to use to create a derived variable, because remember you may be getting them from different instruments. In looking at a you know 15-variable table there's going to be lots and lots of combinations and sometimes that can get pretty cumbersome. So when that happens I break it up in you know like I may create interim derived variables and check them and then do a final crosstab of say four interim variables and then my overall the variable that I'm really interested in and check that. It just makes the number of rows smaller to check but you're still doing the same thing. You're still QCing the variable that you're interested in. So here's some overall best practices that we follow in NSCAW and they are good ones to follow on any project. So you always want to document. Document document the variables that you're creating. Again the numerator,  denominator so in a I like to document that in a file that's separate from the program itself. You want to create a data set that is permanent and saved for your an that's to be used for your analysis task. You don't want to create it or on the fly and not save it. The reason for that is let's say 6 months from now someone comes back with comments on a manuscript you submitted. You want to be able to make whatever changes they're asking for in the same data set that you had used originally when you created the results. You want to have all the variables there that you used before. You don't have to have to start from scratch again. I've mentioned previously about the minimum data necessary right? That's related to variables. You don't want to carry around a whole bunch of variables that you don't need. You want to make pay attention to missing values when you are creating your variables and when you're QCing those variables because a lot of times the ones that have a missing special missing code are the ones that you need to be the most interested in. And consider making a code repository. For example like a macro program in SAS that includes creation of variables that you may use often. If you're creating a special variable related to you know a different one related to race or one related to education and you want to use that in multiple papers, save that code and reference it that way you can QC the creation of that variable once and you don't have to do it every time you just you know reference the same code. Okay so now that we've created our analysis file and we've QC'ed all the variables it's now time for analysis and we are going to talk about each of these topics next and all of these topics are specific to NSCAW 3. 

[ONSCREEN]
A close-up of a computer code with the following syntax:
proc surveyfreq;
weight nanalwt;
cluster nscawpsu;
tables dental/cl;
run;

[Marianne Kluckman]
So the first one is weighting and like we mentioned earlier you have to adjust for the sample design and you have to use the software package that will do that for you. So there's several ones you can use some of them are listed here you can use SUDAAN, you can use the SAS survey sampling procedures. Those are the ones that I use and I you can see the snippet on the screen here. So surveyfreq SURVEYMEANS in SAS they'll account for the sampling. You've got to use a weight and the weight is NANALWT and you have to use a weight for all your estimates right? Any mean you do, a percentage you do you've got to put it in there otherwise you're not going to get the correct result. And to correct the correct variance and standard errors and therefore confidence intervals you've got to identify the psu and that's nscawpsu and like I mentioned before we don't have a stratum like they did in NSCAW 2 it's just the psu. So if you look at the snippet of code to the right there you can see see the weight statement and you can see the cluster statement which has our psu and I'm doing a frequency of dental which was the variable we created before. The CL option just gives me the confidence interval. 

[ONSCREEN]
Table indicating whether children received dental services using survey estimates.

[Marianne Kluckman]
So this is the output from that code. And you can see it gives me the frequency, the weighted frequency, and then the standard errors and the confidence interval. And those since we use the weight and the psu identifier it will give us the correct standard error and the correct confidence intervals. All right I'm going to spend a little bit of time talking about subpopulation analysis. So a lot of times we want to focus on a specific subpopulation whether that's kids five and under, or kids living in out-of-home care. We do not want to subset the analysis file. You have to keep all the records in your analysis file. If you remove some records the variance and standard error you could get an incorrect variance and standard error. So there's a couple ways you can do this. You can use a subpopulation statement in SUDAAN. So you create a flag and you just identify that subpopulation. That's a domain statement and SAS SURVEYMEANS. There's other equivalent statements in other software packages. The final way or another way is to create this flag and just use it in your crosstabs if you're doing percentages and that's what I'm going to show an example of next. So my subpopulation is females aged 17. Now I am using two variables the CHDAGEY which is the age and years and CHDBIRTHSEX which is the biological sex. There's also a gender identity variable in the file. I'm not going to use that I'm going to use birth their biological sex at birth. So the incorrect way we have on the left and the correct way we have on the right. The incorrect way would be to subset the file so like with an IF statement saying just keep records that have an age of 17 and a birth sex of two which is female. We don't want to do that. We also do not want to use a WHERE statement in a procedure to subset using those two variables. What we do want to do is keep all the records but we're creating a variable to identify the subpopulation. So sometimes you may already have that variable right? Let let's say your subpopulation was females. You already have that variable you can easily identify you know CHDBIRTHSEX equals 2 as your subpopulation. I have to create one. So I'm creating one variable that's called FEMALE_AGE17 and again it's just their age is 17 and their sex is 2 which is female. 

[ONSCREEN 40
Table depicting the "FEMALE_AGE17" variable by age and sex.
Additional rows of the table depicting the "FEMALE_AGE17" variable by age and sex.

[Marianne Kluckman]
So now that we've created a variable obviously we have to QC it right? So here's my list crosstab of age and sex and my flag variable. And you can see obviously not all the rows are here they didn't fit on the screen but you can see that the only thing that has a value of one for my created variable are females age 17. So that identifies my subpopulation where FEMALE_AGE17 equals 1 is my subpopulation of interest. So now I'm going to run my crosstab, so I'm using a variable that was called EVERSEX which as it sounds means whether the the the child had ever had sexual intercourse and here is the incorrect way on the left and the correct way on the right where female age 17 equals 1 they used a where statement. We don't want to do that. We don't want to limit to any kind of records in the surveyfreq. What we do want to do is include the flag in our table statement. So you can see now we have a table statement where instead of just EVERSEX it's the flag for a subpopulation crossed by EVERSEX and I'm including the row percent in my output. 

[ONSCREEN]
A screenshot of a the "SURVEYFREQ Procedure table" with summary statistics for whether a child has ever had sex.

[Marianne Kluckman]
So this is our output from the incorrect example and there's two things I want you to notice at the top. So the number of observations is 58. Now Keith mentioned way early on that the number of observations in the NSCAW 3 data is 3,298 so that's very different than 58 so we've subset down to 58 females age 17. The number of clusters is 35. And you can see in the output that our percent is 77%. And it's got our standard error and our confidence intervals. So this is the incorrect example. 

[ONSCREEN]
A screenshot of a the "SURVEYFREQ Procedure table" with summary statistics for whether a child has ever had sex cross-tabulated with the "FEMALE_AGE17" variable.

[Marianne Kluckman]
The correct output you can see we've got the 3298 for the number of observations and we've got 61 clusters. So all those other clusters were deleted from the data. And it's a little bit harder to find our percent that we're interested in but we find our flag of one and then EVERSEX is yes and the row percent is 7739 exactly the same as it was before. 

[ONSCREEN]
A screenshot of an incorrect table.
A screenshot of corrected table.

[Marianne Kluckman]
Here is the output just grabbing the row with the percentages we're interested in for both the incorrect and correct. And you can see the unweighted frequency is exactly the same, the weighted frequency is exactly the same, the percent is exactly the same. What happens is it affects the variance when you remove records and therefore the standard error. So you can see the standard error of the incorrect is 7.54 and the correct is 7.49. So with a smaller standard error our confidence limits are going to be a little bit tighter in the correct using every single record. The way that I like to describe it to people is SAS needs every record to compare the variance. So it's like giving somebody a book if you take out the odd chapters and ask them to write a review they're not going to have all the information so they can't really write a review on that. It's the same thing with SAS it needs every record to give us the correct variance. 

[ONSCREEN]
Table with columns Eversex, Percent, SE of Percent, RSE calculation, RSE as a %, with data for Eversex Yes and Eversex No. For Eversex No, RSE as a % = 33% which is an unreliable estimate.


[Marianne Kluckman]
A note about unreliable estimates. When you're reporting results from NSCAW you need to flag an unreliable estimate and what is unreliable is a relative standard error which is the standard error of the estimate divided by the estimate and then expressed as a percentage. If that relative standard error is greater than or equal to 25 it has to be noted somehow. Whether it's a footnote or a symbol next to the estimate value or you know some different kind of shading or whatever it is it needs to be noted in your tables so that people understand that the RSE is is greater than 25% greater than equal to 25%. 

[ONSCREEN]
A screenshot of a table indicating reported peer acceptance by race.

[Marianne Kluckman]
All right the last thing I'm going to talk about is suppression and I alluded to this earlier but any estimate that is based on an unweighted N of less than 11 needs to be suppressed from your text in the tables. So this is a small little mockup of a table on the mean peer acceptance scale of peer acceptance the mean value by race. And you'll see Asian kids and Native Hawaiian/Pacific Islander kids have dashes and that's because the unweighted N we're less than 11 so we can't show the mean peer acceptance value and we're not showing the N's. So they're replaced by dashes and though and then I've got a footnote that just says that they're suppressed because their count was less than 11. And that's it. Any

[Paige Logan]
Thank you so much Marianne and Keith. I don't have any questions in the Q and A chat just yet but I think it makes sense to maybe give it a minute or two if people have questions to type that up. Sometimes it takes a little bit of time for the questions to come through if folks are have to type them out so maybe we'll give it.

[Sarah Sernaker]
Can I ask a question? Hi Marianne.

[Marianne Kluckman]
Yeah go for it of course.

[Sarah Sernaker]
I've been curious how come stratum wasn't included in the data set this time around?

[Marianne Kluckman]
Okay I'm gonna let Keith answer that one.

[Keith Smith]
Well I wish we had Paul Biemer on. He was the sampling statistician who drew the sample. It was just a just a different way that the sample was drawn where we didn't we didn't sample by stratum this time and I wish I had a better answer. But it just maybe we can provide an answer later for folks that might be interested. I'm not sure and again this that's a sampling statistician question and the person that would be best qualified to answer that you know I would need to consult with.

[Sarah Sernaker]
Okay I got you that makes sense though and yeah I know no Paul Biemer is behind all the survey stuff 

[Keith Smith]
Yeah the sampling

[Sarah Sernaker]
Yes.

[Paige Logan]
So we have one question that says will NDACAN provide support for the NSCAW two data for folks who use R or SPSS. So I that might be a Sarah question.

[Sarah Sernaker]
I could respond because I'll be I would be the one providing support and my colleagues. We do provide support in r and SPSS. I will say with SPSS just any complex survey analysis like you need a special add-on it's just one of those programs where like this complex survey analysis is not inherently built into what you might have downloaded is my understanding. Maybe it's standard now. But and yes so R also we provide support for and that's all free open source so.

[Paige Logan]
and that would be there was a clarification that came through that would be for NSCAW 2 or one as well?

[Sarah Sernaker]
Yeah we any any of our data sets so NSCAW one, two, or three as well yep.

[Paige Logan]
Great. All right we'll do a last call for any questions. I've only been on two of the summer training series sessions but I think this is this usually I don't know if we end early but oh okay cool we have another question. What are the requirements to request this data? 

[Sarah Sernaker]
I could speak to that maybe Andres could add because he deals with the licensing specifically but we have an application process which I think Keith highlighted on one of the early slides. There's a page that has the eligibility and requirements and I guess the biggest one with NSCAW is the IRB approval. There is a General Release for earlier NSCAW there is only Restricted Release for NSCAW 3 right now. General release data sets have less restrictions but if you're doing serious research and publishable research you should get the restricted release in any case.

[Paige Logan]
Thanks Sarah. We have another question that came in what is the timeline for future waves of NSCAW 3.

[Keith Smith]
I can answer that we're in the process of preparing the NSCAW 3 wave 2 data set and documentation. Our plan is to send that data and documentation to NDACAN hopefully by the end of August if not then maybe it'll slip into September then and Sarah can talk a little bit about this. NDACAN does a very thorough review of all of our documentation and provides feedback on you know questions they have about it you know if they have questions about a variable again Sarah's very good at digging into the data and documentation and reviewing it. So that process Sarah usually takes about a month or so for providing feed back to us. So once that done there's sort of an iteration of you know prepare you know addressing the feedback that we got from NDACAN and it's usually then another month or so that the deliver the data documentation and data sets are finalized and made available to outside researchers. And and NDACAN will always announce on the list serve when the data is available for researchers to apply for it. Sarah did you have anything else?

[Sarah Sernaker]
No that sounds about right all that in the timeline and yep.

[Keith Smith]
So within the next few months we'll be or NDACAN will be releasing the wave two data for NSCAW 3 and that's it'll just be the baseline and wave two data for NSCAW 3 so that will be all of the data that will be provided for the NSCAW 3 cohort.

[Paige Logan]
And then Keith I think we had a follow-up question to that will there be more than two waves and do you know roughly when those are planned?

[Keith Smith]
Yeah and and right now the contract just calls for two waves of data. There's always a possibility that we'll follow up with this cohort and you know maybe have a wave three but as far as the current contract calls for there will only be two waves of data. Why were teachers excluded from NSCAW 3? I the only thing I can say about that is that for this particular request for a proposal from ACF the teacher survey wasn't included and I'm not sure what the rationale was for ACF to not include it but it you know we didn't include it because the the contract from ACF didn't have it.

[Paige Logan]
Thanks Keith. So those are all of the questions in the queue. We have a compliment saying great session thank you.

[Keith Smith]
I'd just like to say thank you for to everyone for joining and thank you to Paige and Sarah and Andres for hosting it and we were happy to participate and provide everyone with information about NSCAW.

[Paige Logan]
thank you Keith and thank you Marianne for taking the time and walking us through that today. Marianne if you could go to the next slide I think I'll just wrap up our session. Great so yes thank you everyone for joining us today. Our next presentation and it will be our last presentation for the Summer Training Series is the same time next week August 14th at 12:00 p.m eastern we will have our NDACAN Statistician Sarah Sernaker presenting with Tammy White who is from the Children's Bureau and they will speak about the NYTD data set strengths and limitations. We hope to see you all next week and thank you so much again have a great rest of your wednesday.

[Sarah Sernaker]
Thanks everyone.

[Marianne Kluckman]
Thank you goodbye.

[Keith Smith]
Bye.

[VOICEOVER]
The National Data Archive On Child Abuse And Neglect is a collaboration between Cornell University and Duke University. Funding for NDACAN is provided by the Children's Bureau, an Office Of The Administration For Children And Families.

[Music]