[MUSIC][VOICEOVER]National Data Archive On Child Abuse and Neglect. [Clayton Covington] All right it is 12:00PM Eastern Time on the dot so welcome again everyone to the National Data Archive On Child Abuse Neglect's Summer Training Series, with this year's theme being Best Practices in the Use of NDACAN Data. A couple of reminders before we get started is that we're going to have a presentation that'll get started shortly and I'm sure you all have plenty of questions. Please direct your questions to the Q and A box available at the bottom of your screen on Zoom and if you have any questions outside of that or need any type of technical support you can visit the Zoom help center and or email the archiving assistant here at NDACAN Andres Arroyo. As a reminder the session is also being recorded. So we're going to gather all questions and save them for the end with about 5 to 10 minutes of Q and A. Next slide please. So as I mentioned this is the NDACAN Summer Training Series with the theme of Best Practices in the Use Of NDACAN Data put on by NDACAN which is sponsored through Cornell University and Duke University. Next slide. We are also an organization while housed at these two universities specifically funded through the Children's Bureau which is an office of the Administration for Children and Families within the United States Department of Health and Human Services. And without further ado I will pass things on to our presenter Sarah Sernaker. [Sarah Sernaker] Thanks Clayton and for everyone here this is Clayton's last day so I'm sure if you've been here before you've heard Clayton's voice or seen his face. But this will be his last presentation and day with us and he's moving on to finish his PhD at Harvard and we wish him all the best. But anyway to our series here's the schedule I've just put on the slides. We are almost done with our we're about halfway through our summer series. Today we're going to be talking about survey design and using weights. Next week is going to be a talk very specific to NSCAW 3 so hopefully this is a sort of nice lead in because NSCAW 3 is a survey design that uses lots of weights and so hopefully this is kind of a soft intro into survey. And then the last week will be a presentation about NYTD strengths and limitations kind of analogous to the earlier presentations about NCANDS and AFCARS strengths and limitations. And then the summer will be over in a flash. So as I mentioned today we're going to be talking about survey design and weighting. And so this topic is not specific just to NDACAN data. I was really happy to think of this idea to present it because we get a lot of questions about survey weights you know using our data and otherwise. Our biggest survey survey data set is our NSCAW data. A lot of our other data sets don't actually include weights. But again this is very just general general topic and discussion about survey weights and survey design. So first we'll talk about survey sampling. What makes a survey and why you do surveys. And then survey weights and why we care. And then we'll just do a brief example in Stata. I do like to tell people for survey analysis R is my go-to for almost all things but for survey analysis I prefer Stata. It has a lot of built-in great functionality and I definitely recommend starting there if you can. I just wanted to include a list of terms that will come up and if you're you know researching survey design you see these things a lot and hopefully we get a chance to define them all. So target population is what you are hoping to survey. Sample population is the the basically the population that you end up sampling. And then we have primary sampling unit and secondary sampling units so kind of intuitively named. Those are the units which you sample from. And we'll get into more details soon. And then strata and cluster come up and those are just ways to sort of divide the population but we'll get into all of this hopefully in the next few slides. So survey sampling. Why do we take samples and what's the deal with weights and surveys? And the whole point is you want to make inference about a target population. So let's say we want to make inference about all the adults in the United States. It would be almost impossible to collect data on every single adult in the United States. And so that's why we take samples. We take samples just smaller groups of people that we hope represent you know all of the adults in the United States or whatever our target population is. But that's why we take samples because it's almost impossible to always get a full census or full you know survey designed to cover your whole population. And when we take a sample we want to minimize sampling error and bias. We want to minimize practical things like survey time and cost while maximizing coverage and precision. So we have this big balance of wanting to take a sample because it's not feasible to collect everyone's information but we want to make sure our sample is basically quote unquote good enough. And so that means a balance of practical things that are very much limiting in the real world like money and time and people and man hours while also making sure we're taking a good sample with good estimates and not just some convenient sample for example. And sampling is almost always done without replacement. So think of if you take a survey you're not going to administer a survey to the same person. And I just wanted to make this point here just because of the this last point. But so samples are usually small enough with respect to the target population such that one unit's probability of inclusion will not affect another. So think about it like this if you want to take a sample of the whole United States adults then you take a small subsample. And what even if you took a 10,000 people that's a pretty good sample, but relative to the whole United States it's a small proportion. And so when we say small enough and your probability doesn't change think of if I sample Andres from the whole United States. His probability of being selected is probably like .0000001 you know one out of I don't know how many people are in the United States 300 million or something. So if I sample Andres my sampling probability doesn't change too much. And so that's what we mean that sampling is done without replacement. And I'm fixated on this just because I wanted to talk about if a sample becomes too large sometimes your target population is smaller, sometimes your target population let's say is a school district. In that case it might be easier actually to get almost everyone in your sample. If your sample is large enough with respect to the target population, for example 90% then you would need to consider finite population correction. And I'm already getting into technical stuff this is the last time I'll mention finite population correction. I just wanted to mention this because it comes up in the programming languages and some of the terminology and I just wanted to introduce what it means. But I will say I've never had to use a finite population correction. It's really hard to you know cover a sample or collect a sample that really covers such a large proportion where individuals' sampling probability is that affected. So what do we mean when we say sampling and probabilities and whatnot? And probability sampling which is usually you know what most surveys are designed under, there's probability sampling and non-probability sampling, but probability sampling says each unit has a calculable probability of being sampled. And that just might mean a simple you are one person out of 10 and so you have a sampling probability of 10%. It's basically as simple as that. And so this is really important because when we think of each unit having a calculable probability, this directly will correspond to how weights are created. Weights are essentially and we'll get into this more but weights are essentially your inverse probability of being sampled. And so you know whatever sampling and survey and blah blah blah just this basic fact each unit being sampled has a calculable probability of being sampled. And the ideal situation in any survey design is a simple random sample. And a simple random sample means every unit of measure in the target population has an equal probability and therefore an unbiased representation. And that's the important piece. If you have a simple random sample the idea is if it's purely random it should be and your estimates should be unbiased. Your sample should be directly representative of the broader population, if you can take a simple random sample. And this is because of law of large numbers and taking large enough samples and sample estimates converge to population values. So a lot of math underlying these sayings but the idea to take away is this is like the ideal and basically what survey designers and what our weights are trying to replicate is this ideal case where you have unbiased representation. There are some cons that come along with this so it's ideal in the sense of I think statistical efficiency and estimation. But in a simple random sample you may miss very small populations because of inbalance and especially I mean we're in the child welfare field and broader sociological field usually the interest is of small groups who are not being represented in surveys. And so while simple random sample is the ideal for statistical efficiency it's not really practical in real life settings. It's not practical as far as you know special research interests and so it's not practical for planning or implementation and cost-wise. Just think if you wanted to take a sample of the whole United States and you wanted a truly random sample I mean the planning and costs would be enormous I imagine. And in reality there's usually heterogeneous response rates by geography, survey methods, and demographics. And so what that means is you know especially in the sociological broad umbrella or psychological design people are not random. And so you know these small groups that are not being represented actually do have you know significantly different views. And so again I say simple random sample is ideal as as far as statistical efficiency and what kind of weights adjust for at the end, but in reality almost every survey you come across will not be a simple random sample for various reasons. Other methods that are super common and that come up a lot are stratified sampling. And so this is when you divide the population into homogeneous mutually exclusive groups. So that's the definition of strata. And then within those strata independent samples can be taken. So this ensures adequate sample size of subgroups of interest and increases the precision. So you're increasing your statistical precision by sort of creating making sure your you know imbalance groups or sort of smaller groups are being represented with large enough ends to make you know valid conclusions. So thinking about taking a sample of all the united all the adults in the United States if you really wanted to do that, a stratified sampling would help break up the work. So instead of taking on the nation as one task, a stratified sample would say okay some natural strata are the states, and within each state then you might take a random sample. And I'll talk about this in a second but you can have multiple stratas. So you could stratify by state and then by county and then maybe by school district or you could do other various stratified sampling. So again it's just dividing up the population into homogeneous groups so that when you do sample it you get a good representative you get a good representation from the people in the or the units in your strata. There's also something called cluster sampling. So this is very nuanced difference. Cluster sampling is when you divide the population into groups or clusters and then randomly select a number of clusters from which all units are then included in the sample. So this is a little bit different from stratified sampling because in stratified sampling you sort of divide you know hold in your mind the picture of the country. You divide it into states or maybe by counties and then within each state let's say each state administers their own random sampling scheme but not everyone in the state is sampled. In cluster sampling think of the school district example I briefly mentioned. Let's say you want to sample school districts. You might make a list of all the school districts in the state, but you don't have the funds to survey all the school districts in the state. But you have the list of school districts so you take a random selection of the school districts and you say okay I've chosen my school districts these are my clusters and within the cluster I'm going to try to survey every single person or unit within the school district. So maybe it's teachers. And that could definitely be feasible. You know if it's at a level that which you can cluster and then sample everyone. This is you know talking about the balance of planning and costs and whatnot cluster sampling is just kind of a different different approach than stratified. So you have mutual homogeneity but internal heterogeneity. So again thinking about the school district you know school districts function at a high level pretty similarly but obviously if you're sampling let's say a rural school district versus an urban school district there's going to be heterogeneity within the school districts. And again this increases sampling efficiency. These are all methods to increase sampling efficiency. And these are the two most popular sampling designs. There's a few others there's like systematic sampling which I won't even get into. I can't even think of others because these two are really it. And when you see a survey you might see that there's multiple stages. And so to my point is you can use cluster and/or stratified sampling multiple times, you can use them together. You could use one and then the other and vice versa. But you can combine these sampling strategies in a way to divide up your population to maximize your research utility again while minimizing cost and all of that to capture the sample that's going to answer your research goal. So when you take a probability sample you're inherently going to have bias and those types of bias comes up in different ways. And so three main sort of prongs of bias are first, nonresponse bias. And this is particular I was going to say this is particular mostly to surveying humans but it could be like at an organization level. But non-response bias is if you try to include someone in your sample you send out a survey and they don't respond. And the thing with non-response bias is usually those who don't respond are characteristically different from those who are responding. So you know people who don't respond maybe are grumpier or feel don't trust researchers as much or don't trust big government or or just think of of people who you know you'd call and they don't answer and their personality differs from your friend who answers all the time and calls you all the time. So that is non-response bias there's just characteristic the units that don't respond are char characteristically different from those who are responding. There's also selection bias so some units have a different probability of selection that is unaccounted for by the researcher. So this especially is what these are what weights will account for but selection bias is really sort of on the fault of the surveyor. And so that's what you would plan for in your survey design and why survey design and sitting down to think of survey designs is so crucial to make sure you're dotting your eyes and crossing your tees and making sure everyone in your target population is is sampled or accounted for and you're not omitting small populations or overlooking things like that. And this also leads into coverage bias. So coverage bias is when some population members do not appear in the sampling frame, so undercoverage. So for example something that's taken for granted you know even in my own research is when you give out surveys who can't respond to the survey? People who don't have a phone phone if you're doing phone surveys. People who don't have an address if you're sending out notices. So the homeless population is often excluded. Those who are incarcerated are excluded just by sheer you know function of not being available if they're out of reach from surveying technique. So this is also all of these together you know they kind of overlap in different ways. But the idea is that you can only do so much in a survey design, you can only sample so well, you can only account for people responding or not responding, or bother them so many times to respond. And so this is these are the biases that come up in probability sampling and ultimately why we have survey weights. So survey design before we get into the weights section. Survey design is is really just thinking about all of the things I said, thinking about your research goal, who your target population is, and then define and on you know defining your limitations, defining your reach, how you will reach people or units or whatnot. And putting together a plan to then implement. So when you're designing a survey and let me just pause real quick again. When I say designing a survey and I've never designed a survey and you may never design a survey, but why am I talking about designing a survey because it's really helpful to understand how surveys come together. Because it's really helpful to understand not only for the weights and understanding how to best use the weights and what the weights can get you ultimately, it just helps you like work through the whole flow of the survey and understand how some units might have been included and some excluded, and just knowing all of the terminology. So I've never designed a survey but it's again a good broad thing to keep in mind when you're dealing with surveys. So defining the target population and I've mentioned that a few times now. What is the target population? So this is the largest encompassing group of all units to which inference and conclusions can be made. And so anytime I read well anytime I get sur data I say what is the target population? And that is who your survey is supposed to be covering. So in my example of sampling all the United States or sampling all the adults in the United States, my target population would be adults in the United States. Sometimes your target population is let's say children age zero to five who live with their mothers, or children who have experienced foster care. So that that is like who the broadest, who you want to make inference and research questions about at the broadest level. You're acknowledging you might not reach everyone but that's who you ideally, if you could sensus, that's who you'd be taking your sensus of. And that is important to identify when you're working with the survey because like I said when you're making conclusions and inferences that is who the population you're making inference about. It's not about anyone else I mean maybe subpopulations within but that is who the survey is directly representative of or tries to be anyway and not anyone else, so it's like very clearly defines who you're talking about. So target population is an important one to identify or define at the get-go. Then you would define your sampling frame and design. And so that just means you define strata or clusters that the whole target population can be divided into. And this is almost always based on geography or geography almost always informs it. Whether it's state lines, county lines, neighborhoods, I've seen city blocks. Like a physical definition of people. And that's because I mean I say people, people who live near each other live similar lives and that's why geography is just always you know a good place to start at the very very least. And state laws you know all of that. Or sometimes sampling frames, strata, and clusters are defined on demographics. So I don't know how much detail I get into NSCAW but one of NSCAW's features is they create these sampling domains. These are basically strata and they've defined populations in which they want to make sure they sample enough people. And so they define these domains based on demographics. They say we're going to sample children one to five who have had this experience and all the other domains are mutually exclusive. So strata and clusters are not all not strictly defined on geography they can be defined on other demographics or other characteristics of interest especially for research interests. You can define second stage strata or clusters, and so those might those would be smaller units within your main strata. So like I said think of states as a first-level strata and then within states maybe you're serving child welfare agencies for example. And so this again is just to sort of organize how the survey was collected and created and how people are being sampled. Define any additional sampling clusters so this is a bit redundant because this is really just another stage of any strata or clusters. Just think of you know the highest levels and then whittling your way down to a feasible sampling unit. So like I said state and then maybe you choose child agencies as your second strata. Within child agencies you split based on domains or demographics and then you sample people based on those smallest unit domains. And then once you've you created your frame of strata and clusters and domains and this and that, you need to define the primary sampling unit that will be randomly sampled. So this is like once you've gotten down to the bottom level, let's say I'm sampling child welfare agencies, that's my second strata then you need some random sampling done, so random sampling of the children or of families. And this is where you should be informed whether you're creating or using a survey is the primary sampling unit a single child within a household for example? Could it be could siblings be sampled within a household? Or is a household chosen and then a family member is randomly selected from the household? These are all things that I've seen in surveys. And so it's important to understand who the primary sampling units are because for two for a few reasons. The primary sampling unit may not align exactly with what you're hoping to do your research on. For example, if you're hoping to look at siblings but children are individually and randomly sampled you might not be capturing siblings, versus if you randomly sampled a family unit. So primary sampling units may not align with your research goals. You might want to work at the strata level. Also primary sampling units coming back to again who are you making conclusions and inferences about? If you if the primary sampling unit is a family versus an individual child, like you just have to go about your data analysis and understanding of the data you just have to be careful with how you interpret and understand the data. And a survey design is decided and set before data collection begins and it should remain unchanged if there's multi-waves. Sometimes surveys are given out once and that's it. Sometimes as we'll see with NSCAW, surveys are longitudinal and multiple waves are being collected. But a survey design is not an everchanging thing it's something that's defined before people are even sampled and it's planned based on prior research, prior information, funds etc.. But the point is to do it beforehand because if you start adjusting a survey design when you're in the survey, you're injecting bias because you're saying I'm adjusting my survey design based on information that I'm collecting as the current survey. So you're biasing, you'd be biasing your estimates and they just wouldn't be comparable, your first half of the survey to the second half, you're just introducing a lot of chaos into a survey. And so these survey designs are decided and figured out well before data collection, and again that's for planning purposes and also for statistical bias and efficiency. Oh man I'm definitely going slower today I just realized halfway through. Okay so a quick example to sort of anchor our discussion is one of our data holdings was called the National Survey Of Child and Adolescent Well-Being. This is a longitudinal survey there's actually three cohorts there's an NSCAW one an NSCAW 2, and then NSCAW three is our latest and each of these NSCAWs has multiple waves. So NSCAW 2 was a longitudinal survey with three waves and the target population is all children in the United States who are subjects of child abuse or neglect investigations conducted by child protective services except those in eight states where laws interfere with survey admission and thus removed from sampling frame. So broadly the target population of NSCAW is children in the U.S. Who are subjects of child abuse or neglect investigations. So if you use NSCAW data this is who and you use survey weights this is who your conclusions and inferences would be about. This is the population in which you're making conclusions. That little parenthetical is a sort of nuance of NSCAW. I won't get into it eight states are just removed due to the fact that there's too many issues to survey them. And so they just design the survey without those states and because they were omitted from the survey design they are not included in the target population. So this is what I mean about defining your target population and understanding who you're talking about when you make your analysis and conclusions. NSCAW 2 and similarly NSCAW one and three are multi-stage stratified design. So they took the U.S. They divided it into nine strata. Eight correspond to the largest states, so think California, Texas, Florida, largest as in populace I think. So eight strata correspond to the largest states and the ninth strata is just all remaining states. And within the strata, the primary sampling units were geographic areas that encompass the population served by a single CPS agency. Basically it's a county some county some states and especially like New York City the CPS agencies there's multiple or maybe CPS agencies cover multiple counties but put simply they almost always equate to just a county. So within the strata the primary sampling units are counties. So they sample counties within the strata and then within the counties all children are categorized into five mutually exclusive domains, so basically think another strata level, and then randomly sampled within the domain. So they create five mutually exclusive domains based on demographics and they sample children based on that. And that is just to anchor because we're going to be I have NSCAW I have an example in Stata with NSCAW 2 so that was just to give a sense of practically using a target population and how they have multi-stage stratified design. But let's put a pin and NSCAW for now and talk about weights. I really like this quote because I can relate to it and I feel like it sums up the sentiment of working with weights a lot. Andrew Gelman's a really famous Bayesien I think he's at Columbia or NYU really big name and he has a paper I should have referenced it here I might have it at the end and it's about survey weights and when to use them and why to use them and blah blah blah and the very first sentence is "survey weighting is a mess". And if you've ever worked with survey weights and were confused by them then you're probably doing it right. There's a lot of broad consensus on how to use weights but I feel like there's also a lot of uncertainty when using weights even by statisticians such as myself. And each research question you might be using weights differently. And so that's why it can be a very tricky conversation to talk about and hard to wrap your mind about why am I using weights and when do we use weights? So why do we use weights? What are survey weights? So like I said before when we take a survey we're taking a sample of a population but we're not capturing every single person in the population that we want we're taking a sample that's the whole idea but we want to make inference about the target population. And so what survey weights does is it helps us basically scale up our analyses from a sampling level to a broader population level and it helps us ensure conclusions and inference are applicable to the whole target population. It adjusts for survey design error and bias. Like I said it's inevitable you're not going to create a perfect survey with no bias and no error. You can do all your best to try to minimize it but ultimately you might have problems in your survey. Whether they're foreseen or not foreseen and weighting is a way, at the end of surveys and once you have data, it's a way to help adjust for these design errors and bias. And weights could be constructed with many different sort of adjustments to account for various different you know errors or bias that we'll get into. But without survey weights if you were to do survey design, let's say you're using NSCAW and you run a linear model you have no idea what survey weights are or how to use them whatever you run a linear model and you get estimates. But the problem is your standard error estimates are going to be smaller than they should be and you might find significant results when there there really aren't any and that's strictly attributable to your small standard errors. And that is simply because your programming language assumes, unless you tell it otherwise, that the data you have are from a simple random simple random survey, or that's data that are IID when surveys are generally not. And it because of the survey design and the selection bias and the non-response bias and this and that and so without survey weights your standard errors will be wrong. You might find with and without survey weights your coefficient estimates for example might not look too different and that's great actually that's usually the case you don't usually see estimates change a lot but it's down to the standard error. Your standard errors is if you don't adjust for survey design and use weights they're going to be too small that's just that's just the problem of the survey. And the rule of thumb is to use survey weights if available. Survey weights are almost always recommended for descriptive statistics such as means, proportions, just general counts. There's less consensus about you know this blanket always using survey weights and statistical models. So that's Andrew Gelman's paper talks about when you might want to use weights in statistical models. And it depends on many factors it depends on the model you're using, it depends on the covariants you're using, and maybe your covariates were used in the sampling frame, and this and that. So like I said it's almost always recommended for your basic descriptives but there's less consensus about using weights and statistical models. And I think some fields just generally don't like survey weights and then some fields love survey weights. Like I think I've heard economists don't really like survey weights. So this is what I mean with like the whole weight debate is ongoing I guess to say. But survey weights what do they do as I quickly mentioned they compensate for estimation bias from unequal selection probabilities, nonresponse, population coverage, and any survey administration issues. For example NSCAW 3 if you can't get the data it might be you might want to get the users guide if you're interested in learning more about surveys and whatnot. They ran into a lot of problems in NSCAW 3 and they actually had to revise their sampling frame midway to account for a lot of nonresponse like there's a lot of states who just did not want to participate this time. And ultimately you come back to a problem of like well we just don't have enough data so we need to keep sampling and you know it's not ideal to change a sampling frame midway but sometimes the survey administration necessitates it. So survey weights are designed to compensate for all of these types of bias and every primary sampling unit who has a valid observation would have a survey weight. So survey weights are not just for him or her or this set of people, it's for every unit in your sample should have a survey weight. And final analysis weights are the usually the cumulative product of multiple adjustments for each stage of sampling. So if you've ever ordered data and maybe there's a few weights in there and you're like which weight do I use? There's usually like a quote unquote final weight is usually what they're called and this is just a bunch of adjustments happening at the same time to account for all of these avenues of bias. And so when you construct you know your final weights you almost always start with what's called base weights and that is to adjust for the inclusion probability during sampling. So remember I said every unit has a calculable probability of being sampled and usually it just comes down to for example you define demographics let's say by sex, race, and age, and you just have population numbers. So if I wanted to sample all the adults in the United States and I took a sample, I do know Census data that says okay well there's let's say a sample Maryland and I know in Maryland there's a thousand white women in Maryland but in my sample I end up with 10 white women. I know you know 10 out of a thousand or one out of a thousand that's the calculable probability that I've chosen a white woman from Maryland so you use use other metrics so I will say it does kind of rely on having sort of base counts and usually it's usually you come back to Census counts to get calculable probabilities. But you have some base population from which you can create sampling probabilities or estimate sampling probabilities of your units and the base weight simply accounts for this inclusion probability and as I mentioned before it's simply the inverse probability of being chosen and if you were less likely to be chosen your weight will be larger. And I like to think of it like this so for example if I sampled 10 white women from Maryland out of the thousand white women that live in Maryland these 10 white women that I've sampled I'm saying ultimately represent the other 1,000 the other 999 women. They are my representative case and so so the the weight they would get would be 1 over 10 over 1,000 so basically a thousand over 10 if you're thinking about this in your head. But so I'm getting myself mixed up but this is all to say if you sample someone from a less representative group their inclusion in the survey they get a higher weight. They're representing more people basically. And so that's really like what it comes down to a weight you can sort of intuitively think of like this is the person in my sample I know there are 99 others just like them out there and this person basically represents them and we assign a weight based on essentially how many people how many other people this person represents. So if I represent a small population of 20 people my weight is going to be higher than or sorry my weight is going to be higher if I'm representing more people rather than less. Intuitively like I said weights you can think of as sort of a measure of like how many other units I you know I represent or I'm providing information for in the sample. You need estimates of the number of units in the target population aka reference population and within each defined strata. And again usually you just get this from the census that's good enough in most cases. Yeah let me keep going. So base adjustment is almost always the first thing you adjust for because everyone has a selection probability and it's just the simplest adjustment you can do to say okay I've sampled this person from this group, they represent actually 10 times the people that we couldn't sample or whatever. So that's your base weight let's say we fixed that that's base weight say b. Then you have additional weight adjustment factors and these are calculated to adjust for other bias such as nonresponse so you can incorporate the probability of non-response based on characteristics. So let's say I have my roster and let's say I don't know white teenage men were less likely to respond and you know that based on the characteristics of the survey so you do this is all happening after the fact so you have some information about who didn't respond. And again now you're saying okay well this white teenage boy did respond to our survey so he's actually representing a lot of people because we have a higher non-response adjustment for this guy because this group of white teenage boys are just notoriously bad at responding. And so again just kind of a layer of adjustments in the weighting construction so that might be let's say R that would be we have base weight b now we have weight for non response let's call it r. If that was where you would stop let's say those are your only two adjustments your final weight would then be your base weight times your non-response weight and that's your final weight. And so again these weights are just kind of constructed as adjustments of each piece of the sampling design. It doesn't usually stop at just two adjustments so other weight adjustment factors could be anything, any survey problems. So again NSCAW 3 is a that users guide is a really good piece of a document that goes into all the survey problems they had and how they adjusted for it. I think they have like six adjustment factors in NSCAW 3 so anything that necessitates revising the original survey design and/or resampling should definitely have a weight adjustment because there's going to be bias and a lot of just uncertainty if you have to revise your survey design. If there is an extended duration of survey administration making it hard to set a reference population or time, that could induce bias or changes in response. So something I'm not going to touch on much but depending on when you survey people if it's taking you three years to ask a group of people the same set of questions, let's say you started in 2019 and then Covid hits and maybe your responses look very different after the fact. And so all of these things about giving a survey really affect the results and should be you know really thought through if you're creating weights or a survey design. The thing with weights is when you create weights they themselves have can have large variation which ultimately interferes with analysis so it's like this double-edged sword of you're trying to create weights to adjust for problems that you had but then weights create some of their own problems. And so you construct these weights and usually what often happens is some of the adjustments for your weights are simply calibrating and adjusting the weights themselves whether trimming or winsorization is a term that comes up, this helps reduce the bias and variability just strictly from the weights. Because again the whole point of survey and sampling and trying to get inference about a population you want to balance precision and bias you want the best estimates and weights unfortunately can sometimes inject bias and variability and so you also really have to be careful balancing that on the weight side. So in creating weights some things that are done after you you know create your weights to adjust for base weights or nonresponse or attrition then maybe you you have your final weight but you find there's too much variability. What you can do are a few things you could do smoothing. So you can create model-based weights that are use observed service quantities. And so this is use a model and survey information to sort of reassign some weights. Get rid of some of those extra large weights and just kind of smooth some of that variability. A really popular one is called calibration or post-stratification. And so this is saying that when you add survey weights across the sampling frame strata and domains, they should actually add up to the target population. So the whole idea like I said with weights, is you have one person let's say who actually represents 100 people and so the weights are a sort of multiplier to say that it's kind of this in implicit intuitive thing to say this person represents actually this many people, and when you add survey weights the whole point is that if you just simply add them and get totals, if you've used your weights correctly they should add up to the population level. Like it's done in such a way that if you add up your weights they should add up to population level metrics. And if they don't sometimes they don't because you have a lot of adjustments here and maybe you do one adjustment and then you calibrate and then you do another adjustment and then you have to recalibrate. But when you do calibration or post-stratification you're creating another multiplier to say to ensure that all of the weights add up to the survey totals. And these multipliers are not usually that big. Usually if you're at post-stratification you've already done a lot of adjustments already and so usually your your weights will get pretty close and so this is kind of just this post-stratification this post-hoc sort of way to account for that. And when you post-stratify it decreases the bias due to non-response and under-represented groups. And that is again because you're ensuring that the weights add up to a population-level metric you're not just throwing weights in and saying "okay we'll add these up and maybe we end up with five people from our up under-represented groups" you're really ensuring that no, our weights are adding up to the population level numbers of our all of our groups and especially our under-represented groups. The other method is trimming winsorization. This is when you have extremely large weights you know depending on the response and whatnot some weights are just super large beyond you know like the 99th percentile. What trimming and winsorization does is basically top-coding so you basically say anything above this value is just going to be set to this value. And so all of this calibrating could happen back and forth you can see how maybe you post-stratify but some weights are too extreme so you trim them but then you have to re-stratify or re-post-stratify to make sure your population and then sometimes you accept that this is close enough or this is good enough because this is all a delicate balance of bias and variability. So like I said this reduces variability because you're reducing those extreme values but you're increasing bias because like I said if you poststratify it and then have to trim, now your weights don't add up to population level and so you're back to the bias. And so it's this very delicate balance and a lot of adjustment goes in and adjustments are dependent on the researcher and who's creating the weights. Another big thing to keep in mind is when you have weights in a multi-wave survey it might be hard to know which weight to use. So if you have weights in a multi-wave survey so for example NSCAW there are going to be weights for wave one, there's going to be weights for wave two, there's going to be weights for wave three, and in some cases like NSCAW there are weights for wave one and two, there are weights for one, two, and three, there are weights for one and three. When you have a multi-wave survey there are going to be many weights. If you've ever looked at NHANES or like The National Longitudinal Survey of Youth they have so many weights and really it's going to come which weights do you use? It's going to come down to your research question and who you're trying to make inference about or analysis or what models you're going to use. And it's going to depend on what I say like the pathway in which someone could take through the waves of data. For example in NSCAW 3 someone could be sampled at wave one and not at wave two but then they could be sampled at wave three. So if you're using data from wave one two and three you're just not going to have wave two for some of your people. Or if you're interested in people who have responded to wave one, two, and three, that's going to look different from the pathway of people who skip wave two. And so all of these things, thinking about how does someone progress through the survey? How is the survey administered and how does it relate to my research question and variables that I'm using? Each research project could use the same weights very differently. And so again that's why weights is always very confusing and no like broad general blanket statement. Okay so this is kind of what I talked on briefly. The choice of weights in a multiwave survey depends on the estimate, the path in which someone can take, and how you respond, and your eligibility. Let me keep going. So survey analysis in programing languages. I know I only have two minutes and then I'll stop for a question or one minute now. Survey in programming language. You have to tell the programming language that you're using survey weights. And Stata like I said I think is the in my opinion has been super great and has amazing built-in functionality to deal with surveys. And what you basically tell Stata is, this is my survey, these are my weights, these are my strata and then you just have to include a prefix to basic functions like lm. But then it accounts for it all behind the scenes and it's very simple and straightforward. In R you need to use what's called the survey package. I don't really think there's any built-in functionality in R for survey design. It's really all through this survey package and I've included these references which I just highly recommend checking out. SPSS I don't think has built-in capability they have what's called this Complex Samples add-on survey analysis. This would be if you are analyzing NSCAW or any complex survey. They have a built-in weight function but it's not very good it's not it doesn't apply to complex surveys. Such as NSCAW. And then SAS is PROC SURVEY. I don't use SAS too much so I can't speak to that. When you're working in your programming language and you want to work with a subsample let's say just males or females, you would not drop the data from your survey from your program. So you would want to keep everything in the same memory and you have to specify to your programming language what the subpop is. And it's there's built-in functionality with all these languages to specifically specify a subpopulation and that's the right way to do it. You don't just want to drop like all men or all females whatever and then run analyses likewise. And the reason you don't want to do that is because if you just straight up drop people from your data, you just cut them out, you're dropping people from strata and you might which is unlikely but you might drop whole strata if you just drop people and observations. And so that is going to affect the survey design and it's not going to coincide with the true survey design, it's not going to coincide with the weights that were created based on the survey design. And so I just wanted to make this point here because it comes up a lot. If you're doing a sub sample all you need to do is create a variable defining your subpopulation,such as if you're male, one, if you're female, zero, or if you're between these ages, one, otherwise, zero. And you specify to your programming language this is my subpop and it will do the rest. This is just my PSA do not drop your data when you're working with a survey. Just don't drop it you're going to cause problems. [ONSCREEN] Screenshot of sample code for defining the survey design. Actual text is: . svyset nscawpsu [pweight= nanalwt], strata(stratum) Sampling weights: nanalwt VCE: linearized Single unit: missing Strata 1: stratum Sampling unit 1: nscawpsu FPC 1: [Sarah Sernaker] I know I only have a few minutes I'll just quickly show you a Stata example with NSCAW 2. This is where you define the survey design. Like I said you just need to tell Stata this is my survey design, these are variables within NSCAW. So within NSCAW there's a psu so a primary sampling unit variable, there's a weight variable, and there's a stratum variable. And notice this is pweight for probability weight and if you're doing survey design and survey analysis these should be in your data. You should have variables strictly defining the psu, the weight, and the stratum. If you don't have variables defining these things you just don't have it to use and you there's nothing to do there, it's not a survey. So whatever survey you're working with would replace this stuff here. [ONSCREEN] Screenshot of Stata sample code for calculating the proportion of children in Wave I of NSCAW II data who are male and female. svy: prop chdGendr [Sarah Sernaker] Just moving quickly we can make these available to users but I know I'm quickly running out of time as the usual. Just really quick once you've done the survey set like I said all you need is a survey prefix and then Stata will take care of the rest. Notice here I've just circled if I run just a basic proportion, number of observations. So this is my sample there's only five about 6,000 people in the sample but notice with these weights, my population size is actually much larger. So that's like my target population. [ONSCREEN] Screenshot of Stata code and output showing the survey prefix gender proportion estimates with a red circle around the linearized standard error estimates, which are larger than the non-linearized estimates. svy: prop chdGendr Screenshot of Stata code and output showing the gender proportion estimate without the survey prefix with a red circle around the standard error estimates, which are smaller than the linearized estimates. prop chdGendr [Sarah Sernaker] That's why we use the weights because it brings you up to the population level. Yeah specifying subpopulation. I'm just going to go quickly through this because I inevitably ran out of time. So I'll leave up the references and stop there. [ONSCREEN] LIST OF REFERENCES Lumley, Thomas. Complex surveys: a guide to analysis using R. John Wiley & Sons, 2011. Lohr, Sharon L. Sampling: design and analysis. Chapman and Hall/CRC, 2021. STATA SURVEY DATA REFERENCE MANUAL (PDF) https://www.stata.com/manuals/svy.pdf Gelman, Andrew. "Struggles with survey weighting and regression modeling." (2007): 153-164. Bollen, Kenneth A., et al. "Are survey weights needed? A review of diagnostic tests in regression analysis." Annual Review of Statistics and Its Application 3 (2016): 375-392. [Sarah Sernaker] And let's see if we can get some questions answered quickly. [Clayton Covington] So we only have a minute so let's do one question and then we'll wrap up. So the question asks how do you use survey weights for descriptive statistics in Stata. I know using weights in a regression is really simple I've just never figured out how to use weights when calculating summary statistics. [Sarah Sernaker] Yeah so it depends on what you want to use but let me just actually go back to this one. [ONSCREEN] Screenshot of Stata sample code for calculating the proportion of children in Wave I of NSCAW II data who are male and female. svy: prop chdGendr [Sarah Sernaker] Really once you define your survey set here you just need your survey prefix and most descriptive functions work with survey. [ONSCREEN] Survey Functions in programming languages. Need to define the sampling frame to your programming language – strata, (primary and secondary) sampling units, probability weights Stata svyset and svy prefix https://www.stata.com/manuals/svy.pdf R survey package https://stats.oarc.ucla.edu/r/seminars/survey-data-analysis-with-r/ SPSS Complex Samples add-on survey analysis SAS PROC SURVEY https://stats.oarc.ucla.edu/sas/seminars/sas-survey/ [Sarah Sernaker] I'm going to scooch really I highly recommend the survey manual in Stata because it'll tell you what functions are compatible with the survey set and it's almost everything that you would need like all your descriptives. But once you define your survey set all you need is a prefix and most descriptive functions work work as usual so prop, total, mean, you just need your survey prefix after defining your survey. [ONSCREEN] Screenshot of Stata sample code for calculating the proportion of children in Wave I of NSCAW II data who are male and female. svy: prop chdGendr [Clayton Covington] All right well Sarah if you could go the last slide. Well we're going to end our questions there but Sarah's contact information is also available if you have some followup. So as for next week we're going to have same time in place here on Zoom, 12PM Eastern Time where we'll have a presentation from our partners at RTI International which they will be covering that third cohort with NSCAW 3 giving some tips and pointers for both experienced and new users. But with that I want to thank all of you for attending our session, and thank all of my colleagues at NDACAN for what's been a wonderful five years and I wish you all best moving forward. But yeah that ends in Paige Logan is going to be assuming the role of the Graduate Research Associate so keep an eye out for emails from her and other communications as you would see typically from me. But thank you all again and I hope you all have a wonderful time. [Sarah Sernaker] Thank you. [VOICEOVER] The National Data Archive On Child Abuse and Neglect is a collaboration between Cornell University and Duke University. Funding for NDACAN is provided by the Children's Bureau, An Office Of The Administration For Children and Families. [MUSIC]