Dynamic Panel Models 

 I have been toying around with dynamic panel models from the econometrics literature and I have hit my head up against a key set of assertions. First, a quick setup. The idea with these models is that we have a set units which we measure at different points in time. For instance, perhaps we survey a group of people multiple times in the course of an election and ask them how they are going to vote, do they plan to vote, how do they rate the candidates, etc. We might then want to know how these answers vary over time or with certain covariates.  

 Here is a typical model: 

     

 There are two typical features of these models that seem relevant. First, most include a lagged dependent variable (LDV) to account for persistence in the responses. If I was going to vote for McCain the last time you called, I'll probably still want to do that this time. Makes sense. Second, we include a unit-specific effect, alpha, to account for all other relevant factors. Dynamic panel models tend to identify their effects with a simple differencing by running the following model: 

     

 Which eliminates the unit-specific effect by the differencing, but our parameters remain, ready to be estimated. I should note that there are some identification issues left to solve and the differences between estimators in this field mostly have to do with how to instrument for the differenced LDV. 

 Reading these models, I have two questions. One, is there a reason to expect that we need both a LDV and a unit-specific effect? This means that we expect that there is a shock to a unit's dependent variable that is constant across periods. I find this a strange assumption. I understand a unit-specific shock to the  initial  level and then using LDV thereafter, but in every period?  

 Two, the entire identification strategy here is based on the additivity of the model, correct? If we were to draw a directed acyclic graph of these models, it would be trivially obvious that we could never identify this model nonparametrically. I understand that we sometimes need to use models to identify effects, but should these identifications depend so heavily on the functional form? It seems that this problem is tied up in the first. We are allowing for the unit-specific effect as a way to free the model of unnecessary assumptions, yet this forces our hand into making different, perhaps stronger assumption to get identification.  

 Please clear up my confusion in the comments if you are more in the know.